Music Genre Classification using Deep Learning¶

Context¶

Objective¶

Train a deep learning model to classify songs into different music genres (e.g., Rock, Jazz, Pop, Classical).

Dataset¶

GTZAN Music Genre Dataset

  • Contains 1,000 audio tracks, each 30 seconds long

  • 10 genres: Blues, Classical, Country, Disco, Hip-Hop, Jazz, Metal, Pop, Reggae, and Rock

  • Each genre has 100 tracks

  • genres original - A collection of 10 genres with 100 audio files each, all having a length of 30 seconds (the famous GTZAN dataset, the MNIST of sounds)

  • images original - A visual representation for each audio file. One way to classify data is through neural networks. Because NNs (like CNN, what we will be using today) usually take in some sort of image representation, the audio files were converted to Mel Spectrograms to make this possible.

  • 2 CSV files - Containing features of the audio files. One file has for each song (30 seconds long) a mean and variance computed over multiple features that can be extracted from an audio file. The other file has the same structure, but the songs were split before into 3 seconds audio files (this way increasing 10 times the amount of data we fuel into our classification models). With data, more is always better.

    • features_30_sec.csv
    • features_3_sec.csv

Post Change

  • 01 blues
  • 02 classical
  • 03 country
  • 04 disco
  • 05 hiphop
  • 06 jazz
  • 07 metal
  • 08 pop
  • 09 reaggae
  • 10 rock

Importing the necessary libraries and loading the data¶

In [1]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [2]:
!pip install librosa
Requirement already satisfied: librosa in /usr/local/lib/python3.11/dist-packages (0.10.2.post1)
Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.11/dist-packages (from librosa) (3.0.1)
Requirement already satisfied: numpy!=1.22.0,!=1.22.1,!=1.22.2,>=1.20.3 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.26.4)
Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.13.1)
Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.6.1)
Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.4.2)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (4.4.2)
Requirement already satisfied: numba>=0.51.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (0.60.0)
Requirement already satisfied: soundfile>=0.12.1 in /usr/local/lib/python3.11/dist-packages (from librosa) (0.13.1)
Requirement already satisfied: pooch>=1.1 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.8.2)
Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.11/dist-packages (from librosa) (0.5.0.post1)
Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.11/dist-packages (from librosa) (4.12.2)
Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.11/dist-packages (from librosa) (0.4)
Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.11/dist-packages (from librosa) (1.1.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from lazy-loader>=0.1->librosa) (24.2)
Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /usr/local/lib/python3.11/dist-packages (from numba>=0.51.0->librosa) (0.43.0)
Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.11/dist-packages (from pooch>=1.1->librosa) (4.3.6)
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.11/dist-packages (from pooch>=1.1->librosa) (2.32.3)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn>=0.20.0->librosa) (3.5.0)
Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.11/dist-packages (from soundfile>=0.12.1->librosa) (1.17.1)
Requirement already satisfied: pycparser in /usr/local/lib/python3.11/dist-packages (from cffi>=1.0->soundfile>=0.12.1->librosa) (2.22)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests>=2.19.0->pooch>=1.1->librosa) (2025.1.31)
In [3]:
# For Audio Preprocessing
import librosa
import librosa.display as dsp
from IPython.display import Audio

# For Data Preprocessing
import pandas as pd
import numpy as np
import os

# For Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm

#The data is provided as a zip file
import zipfile
import os
In [4]:
sns.set_style("dark") # This sets the style of the plots to "dark", meaning the background of the plots will have a dark theme.

Load the Dataset¶

In [5]:
# Import Zip Files

path = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/archive.zip'

#The data is provided as a zip file so we need to extract the files from the zip file
with zipfile.ZipFile(path, 'r') as zip_ref:
    zip_ref.extractall()
In [6]:
import os

directory_path = "/content/data"

# List all files and directories
files = os.listdir(directory_path)

print("Files and directories in '/content/data':")
for file in files:
    print(file)
Files and directories in '/content/data':
07
02
09
08
06
04
01
03
10
05
In [7]:
# import os

# directory_path = "/content/data"

# # Walk through the directory
# for root, dirs, files in os.walk(directory_path):
#     print(f"Directory: {root}")
#     for file in files:
#         print(f"  - {file}")
Directory: /content/data
Directory: /content/data/07
  - 0_07_74.wav
  - 0_07_98.wav
  - 0_07_94.wav
  - 0_07_84.wav
  - 0_07_2.wav
  - 0_07_85.wav
  - 0_07_73.wav
  - 0_07_79.wav
  - 0_07_59.wav
  - 0_07_38.wav
  - 0_07_25.wav
  - 0_07_83.wav
  - 0_07_0.wav
  - 0_07_89.wav
  - 0_07_24.wav
  - 0_07_95.wav
  - 0_07_28.wav
  - 0_07_1.wav
  - 0_07_99.wav
  - 0_07_48.wav
  - 0_07_9.wav
  - 0_07_65.wav
  - 0_07_93.wav
  - 0_07_70.wav
  - 0_07_44.wav
  - 0_07_17.wav
  - 0_07_37.wav
  - 0_07_86.wav
  - 0_07_51.wav
  - 0_07_13.wav
  - 0_07_80.wav
  - 0_07_72.wav
  - 0_07_36.wav
  - 0_07_43.wav
  - 0_07_19.wav
  - 0_07_56.wav
  - 0_07_21.wav
  - 0_07_68.wav
  - 0_07_57.wav
  - 0_07_30.wav
  - 0_07_26.wav
  - 0_07_82.wav
  - 0_07_55.wav
  - 0_07_58.wav
  - 0_07_75.wav
  - 0_07_63.wav
  - 0_07_61.wav
  - 0_07_46.wav
  - 0_07_96.wav
  - 0_07_42.wav
  - 0_07_45.wav
  - 0_07_32.wav
  - 0_07_15.wav
  - 0_07_91.wav
  - 0_07_92.wav
  - 0_07_27.wav
  - 0_07_33.wav
  - 0_07_20.wav
  - 0_07_97.wav
  - 0_07_88.wav
  - 0_07_90.wav
  - 0_07_31.wav
  - 0_07_7.wav
  - 0_07_47.wav
  - 0_07_50.wav
  - 0_07_29.wav
  - 0_07_69.wav
  - 0_07_5.wav
  - 0_07_40.wav
  - 0_07_78.wav
  - 0_07_18.wav
  - 0_07_76.wav
  - 0_07_41.wav
  - 0_07_54.wav
  - 0_07_3.wav
  - 0_07_67.wav
  - 0_07_6.wav
  - 0_07_39.wav
  - 0_07_10.wav
  - 0_07_77.wav
  - 0_07_4.wav
  - 0_07_49.wav
  - 0_07_11.wav
  - 0_07_64.wav
  - 0_07_81.wav
  - 0_07_12.wav
  - 0_07_87.wav
  - 0_07_35.wav
  - 0_07_60.wav
  - 0_07_16.wav
  - 0_07_23.wav
  - 0_07_53.wav
  - 0_07_62.wav
  - 0_07_34.wav
  - 0_07_71.wav
  - 0_07_8.wav
  - 0_07_14.wav
  - 0_07_22.wav
  - 0_07_52.wav
  - 0_07_66.wav
Directory: /content/data/02
  - 0_02_71.wav
  - 0_02_2.wav
  - 0_02_73.wav
  - 0_02_12.wav
  - 0_02_69.wav
  - 0_02_92.wav
  - 0_02_60.wav
  - 0_02_24.wav
  - 0_02_5.wav
  - 0_02_74.wav
  - 0_02_67.wav
  - 0_02_21.wav
  - 0_02_51.wav
  - 0_02_97.wav
  - 0_02_96.wav
  - 0_02_89.wav
  - 0_02_26.wav
  - 0_02_7.wav
  - 0_02_22.wav
  - 0_02_30.wav
  - 0_02_91.wav
  - 0_02_65.wav
  - 0_02_81.wav
  - 0_02_13.wav
  - 0_02_83.wav
  - 0_02_11.wav
  - 0_02_79.wav
  - 0_02_63.wav
  - 0_02_53.wav
  - 0_02_34.wav
  - 0_02_85.wav
  - 0_02_35.wav
  - 0_02_32.wav
  - 0_02_61.wav
  - 0_02_93.wav
  - 0_02_94.wav
  - 0_02_98.wav
  - 0_02_25.wav
  - 0_02_18.wav
  - 0_02_31.wav
  - 0_02_6.wav
  - 0_02_19.wav
  - 0_02_37.wav
  - 0_02_4.wav
  - 0_02_49.wav
  - 0_02_23.wav
  - 0_02_72.wav
  - 0_02_45.wav
  - 0_02_57.wav
  - 0_02_9.wav
  - 0_02_29.wav
  - 0_02_55.wav
  - 0_02_64.wav
  - 0_02_47.wav
  - 0_02_77.wav
  - 0_02_44.wav
  - 0_02_86.wav
  - 0_02_76.wav
  - 0_02_82.wav
  - 0_02_78.wav
  - 0_02_36.wav
  - 0_02_28.wav
  - 0_02_20.wav
  - 0_02_27.wav
  - 0_02_43.wav
  - 0_02_15.wav
  - 0_02_39.wav
  - 0_02_14.wav
  - 0_02_42.wav
  - 0_02_99.wav
  - 0_02_1.wav
  - 0_02_3.wav
  - 0_02_0.wav
  - 0_02_54.wav
  - 0_02_80.wav
  - 0_02_40.wav
  - 0_02_62.wav
  - 0_02_58.wav
  - 0_02_59.wav
  - 0_02_17.wav
  - 0_02_90.wav
  - 0_02_75.wav
  - 0_02_68.wav
  - 0_02_46.wav
  - 0_02_70.wav
  - 0_02_48.wav
  - 0_02_66.wav
  - 0_02_84.wav
  - 0_02_41.wav
  - 0_02_52.wav
  - 0_02_33.wav
  - 0_02_38.wav
  - 0_02_56.wav
  - 0_02_8.wav
  - 0_02_88.wav
  - 0_02_87.wav
  - 0_02_16.wav
  - 0_02_10.wav
  - 0_02_50.wav
  - 0_02_95.wav
Directory: /content/data/09
  - 0_09_18.wav
  - 0_09_72.wav
  - 0_09_53.wav
  - 0_09_40.wav
  - 0_09_22.wav
  - 0_09_50.wav
  - 0_09_36.wav
  - 0_09_9.wav
  - 0_09_93.wav
  - 0_09_8.wav
  - 0_09_29.wav
  - 0_09_98.wav
  - 0_09_89.wav
  - 0_09_28.wav
  - 0_09_68.wav
  - 0_09_42.wav
  - 0_09_62.wav
  - 0_09_25.wav
  - 0_09_84.wav
  - 0_09_49.wav
  - 0_09_16.wav
  - 0_09_75.wav
  - 0_09_6.wav
  - 0_09_48.wav
  - 0_09_1.wav
  - 0_09_30.wav
  - 0_09_31.wav
  - 0_09_44.wav
  - 0_09_27.wav
  - 0_09_65.wav
  - 0_09_76.wav
  - 0_09_0.wav
  - 0_09_88.wav
  - 0_09_35.wav
  - 0_09_73.wav
  - 0_09_19.wav
  - 0_09_15.wav
  - 0_09_39.wav
  - 0_09_99.wav
  - 0_09_58.wav
  - 0_09_70.wav
  - 0_09_41.wav
  - 0_09_71.wav
  - 0_09_5.wav
  - 0_09_13.wav
  - 0_09_60.wav
  - 0_09_57.wav
  - 0_09_23.wav
  - 0_09_21.wav
  - 0_09_82.wav
  - 0_09_91.wav
  - 0_09_87.wav
  - 0_09_4.wav
  - 0_09_67.wav
  - 0_09_94.wav
  - 0_09_11.wav
  - 0_09_56.wav
  - 0_09_63.wav
  - 0_09_10.wav
  - 0_09_34.wav
  - 0_09_55.wav
  - 0_09_45.wav
  - 0_09_43.wav
  - 0_09_20.wav
  - 0_09_24.wav
  - 0_09_74.wav
  - 0_09_47.wav
  - 0_09_78.wav
  - 0_09_54.wav
  - 0_09_2.wav
  - 0_09_86.wav
  - 0_09_37.wav
  - 0_09_90.wav
  - 0_09_38.wav
  - 0_09_69.wav
  - 0_09_26.wav
  - 0_09_96.wav
  - 0_09_17.wav
  - 0_09_12.wav
  - 0_09_92.wav
  - 0_09_61.wav
  - 0_09_7.wav
  - 0_09_85.wav
  - 0_09_52.wav
  - 0_09_95.wav
  - 0_09_97.wav
  - 0_09_46.wav
  - 0_09_59.wav
  - 0_09_14.wav
  - 0_09_77.wav
  - 0_09_66.wav
  - 0_09_33.wav
  - 0_09_51.wav
  - 0_09_81.wav
  - 0_09_3.wav
  - 0_09_79.wav
  - 0_09_83.wav
  - 0_09_64.wav
  - 0_09_32.wav
  - 0_09_80.wav
Directory: /content/data/08
  - 0_08_3.wav
  - 0_08_93.wav
  - 0_08_59.wav
  - 0_08_86.wav
  - 0_08_92.wav
  - 0_08_16.wav
  - 0_08_77.wav
  - 0_08_71.wav
  - 0_08_17.wav
  - 0_08_91.wav
  - 0_08_62.wav
  - 0_08_29.wav
  - 0_08_58.wav
  - 0_08_37.wav
  - 0_08_66.wav
  - 0_08_22.wav
  - 0_08_69.wav
  - 0_08_89.wav
  - 0_08_97.wav
  - 0_08_79.wav
  - 0_08_64.wav
  - 0_08_10.wav
  - 0_08_27.wav
  - 0_08_84.wav
  - 0_08_76.wav
  - 0_08_51.wav
  - 0_08_5.wav
  - 0_08_83.wav
  - 0_08_9.wav
  - 0_08_19.wav
  - 0_08_8.wav
  - 0_08_73.wav
  - 0_08_61.wav
  - 0_08_40.wav
  - 0_08_1.wav
  - 0_08_4.wav
  - 0_08_39.wav
  - 0_08_82.wav
  - 0_08_11.wav
  - 0_08_36.wav
  - 0_08_33.wav
  - 0_08_54.wav
  - 0_08_45.wav
  - 0_08_35.wav
  - 0_08_2.wav
  - 0_08_74.wav
  - 0_08_67.wav
  - 0_08_47.wav
  - 0_08_12.wav
  - 0_08_94.wav
  - 0_08_85.wav
  - 0_08_90.wav
  - 0_08_43.wav
  - 0_08_87.wav
  - 0_08_75.wav
  - 0_08_65.wav
  - 0_08_26.wav
  - 0_08_38.wav
  - 0_08_15.wav
  - 0_08_41.wav
  - 0_08_14.wav
  - 0_08_56.wav
  - 0_08_23.wav
  - 0_08_70.wav
  - 0_08_88.wav
  - 0_08_44.wav
  - 0_08_13.wav
  - 0_08_31.wav
  - 0_08_55.wav
  - 0_08_21.wav
  - 0_08_50.wav
  - 0_08_63.wav
  - 0_08_32.wav
  - 0_08_96.wav
  - 0_08_72.wav
  - 0_08_57.wav
  - 0_08_0.wav
  - 0_08_20.wav
  - 0_08_30.wav
  - 0_08_95.wav
  - 0_08_28.wav
  - 0_08_99.wav
  - 0_08_18.wav
  - 0_08_53.wav
  - 0_08_49.wav
  - 0_08_78.wav
  - 0_08_7.wav
  - 0_08_52.wav
  - 0_08_98.wav
  - 0_08_6.wav
  - 0_08_24.wav
  - 0_08_68.wav
  - 0_08_80.wav
  - 0_08_34.wav
  - 0_08_42.wav
  - 0_08_25.wav
  - 0_08_81.wav
  - 0_08_46.wav
  - 0_08_60.wav
  - 0_08_48.wav
Directory: /content/data/06
  - 0_06_63.wav
  - 0_06_46.wav
  - 0_06_84.wav
  - 0_06_13.wav
  - 0_06_58.wav
  - 0_06_73.wav
  - 0_06_29.wav
  - 0_06_56.wav
  - 0_06_87.wav
  - 0_06_19.wav
  - 0_06_47.wav
  - 0_06_20.wav
  - 0_06_91.wav
  - 0_06_66.wav
  - 0_06_62.wav
  - 0_06_45.wav
  - 0_06_44.wav
  - 0_06_79.wav
  - 0_06_61.wav
  - 0_06_83.wav
  - 0_06_3.wav
  - 0_06_92.wav
  - 0_06_76.wav
  - 0_06_64.wav
  - 0_06_48.wav
  - 0_06_50.wav
  - 0_06_42.wav
  - 0_06_7.wav
  - 0_06_99.wav
  - 0_06_17.wav
  - 0_06_89.wav
  - 0_06_24.wav
  - 0_06_74.wav
  - 0_06_9.wav
  - 0_06_2.wav
  - 0_06_54.wav
  - 0_06_1.wav
  - 0_06_27.wav
  - 0_06_35.wav
  - 0_06_34.wav
  - 0_06_53.wav
  - 0_06_70.wav
  - 0_06_51.wav
  - 0_06_15.wav
  - 0_06_68.wav
  - 0_06_65.wav
  - 0_06_86.wav
  - 0_06_37.wav
  - 0_06_40.wav
  - 0_06_71.wav
  - 0_06_78.wav
  - 0_06_75.wav
  - 0_06_69.wav
  - 0_06_8.wav
  - 0_06_11.wav
  - 0_06_25.wav
  - 0_06_31.wav
  - 0_06_0.wav
  - 0_06_67.wav
  - 0_06_33.wav
  - 0_06_16.wav
  - 0_06_96.wav
  - 0_06_60.wav
  - 0_06_94.wav
  - 0_06_41.wav
  - 0_06_43.wav
  - 0_06_32.wav
  - 0_06_10.wav
  - 0_06_26.wav
  - 0_06_59.wav
  - 0_06_72.wav
  - 0_06_22.wav
  - 0_06_21.wav
  - 0_06_82.wav
  - 0_06_30.wav
  - 0_06_38.wav
  - 0_06_28.wav
  - 0_06_52.wav
  - 0_06_55.wav
  - 0_06_36.wav
  - 0_06_93.wav
  - 0_06_88.wav
  - 0_06_57.wav
  - 0_06_39.wav
  - 0_06_14.wav
  - 0_06_6.wav
  - 0_06_4.wav
  - 0_06_18.wav
  - 0_06_80.wav
  - 0_06_49.wav
  - 0_06_5.wav
  - 0_06_23.wav
  - 0_06_98.wav
  - 0_06_95.wav
  - 0_06_77.wav
  - 0_06_90.wav
  - 0_06_85.wav
  - 0_06_12.wav
  - 0_06_97.wav
  - 0_06_81.wav
Directory: /content/data/04
  - 0_04_71.wav
  - 0_04_0.wav
  - 0_04_81.wav
  - 0_04_7.wav
  - 0_04_1.wav
  - 0_04_62.wav
  - 0_04_89.wav
  - 0_04_36.wav
  - 0_04_98.wav
  - 0_04_59.wav
  - 0_04_29.wav
  - 0_04_92.wav
  - 0_04_69.wav
  - 0_04_3.wav
  - 0_04_95.wav
  - 0_04_67.wav
  - 0_04_82.wav
  - 0_04_42.wav
  - 0_04_19.wav
  - 0_04_54.wav
  - 0_04_66.wav
  - 0_04_40.wav
  - 0_04_17.wav
  - 0_04_9.wav
  - 0_04_21.wav
  - 0_04_34.wav
  - 0_04_57.wav
  - 0_04_61.wav
  - 0_04_99.wav
  - 0_04_13.wav
  - 0_04_74.wav
  - 0_04_33.wav
  - 0_04_51.wav
  - 0_04_91.wav
  - 0_04_25.wav
  - 0_04_5.wav
  - 0_04_6.wav
  - 0_04_26.wav
  - 0_04_14.wav
  - 0_04_48.wav
  - 0_04_49.wav
  - 0_04_93.wav
  - 0_04_84.wav
  - 0_04_28.wav
  - 0_04_64.wav
  - 0_04_75.wav
  - 0_04_85.wav
  - 0_04_58.wav
  - 0_04_20.wav
  - 0_04_2.wav
  - 0_04_46.wav
  - 0_04_31.wav
  - 0_04_52.wav
  - 0_04_12.wav
  - 0_04_65.wav
  - 0_04_41.wav
  - 0_04_88.wav
  - 0_04_56.wav
  - 0_04_16.wav
  - 0_04_8.wav
  - 0_04_10.wav
  - 0_04_50.wav
  - 0_04_39.wav
  - 0_04_24.wav
  - 0_04_30.wav
  - 0_04_35.wav
  - 0_04_43.wav
  - 0_04_60.wav
  - 0_04_45.wav
  - 0_04_32.wav
  - 0_04_79.wav
  - 0_04_68.wav
  - 0_04_97.wav
  - 0_04_78.wav
  - 0_04_37.wav
  - 0_04_22.wav
  - 0_04_70.wav
  - 0_04_18.wav
  - 0_04_87.wav
  - 0_04_44.wav
  - 0_04_4.wav
  - 0_04_53.wav
  - 0_04_76.wav
  - 0_04_47.wav
  - 0_04_72.wav
  - 0_04_80.wav
  - 0_04_11.wav
  - 0_04_90.wav
  - 0_04_38.wav
  - 0_04_83.wav
  - 0_04_77.wav
  - 0_04_63.wav
  - 0_04_55.wav
  - 0_04_96.wav
  - 0_04_86.wav
  - 0_04_94.wav
  - 0_04_73.wav
  - 0_04_27.wav
  - 0_04_23.wav
  - 0_04_15.wav
Directory: /content/data/01
  - 0_01_32.wav
  - 0_01_41.wav
  - 0_01_42.wav
  - 0_01_58.wav
  - 0_01_93.wav
  - 0_01_22.wav
  - 0_01_27.wav
  - 0_01_84.wav
  - 0_01_55.wav
  - 0_01_69.wav
  - 0_01_34.wav
  - 0_01_18.wav
  - 0_01_65.wav
  - 0_01_26.wav
  - 0_01_17.wav
  - 0_01_87.wav
  - 0_01_7.wav
  - 0_01_70.wav
  - 0_01_97.wav
  - 0_01_91.wav
  - 0_01_94.wav
  - 0_01_72.wav
  - 0_01_82.wav
  - 0_01_76.wav
  - 0_01_35.wav
  - 0_01_53.wav
  - 0_01_44.wav
  - 0_01_24.wav
  - 0_01_52.wav
  - 0_01_36.wav
  - 0_01_98.wav
  - 0_01_29.wav
  - 0_01_30.wav
  - 0_01_62.wav
  - 0_01_15.wav
  - 0_01_54.wav
  - 0_01_85.wav
  - 0_01_66.wav
  - 0_01_88.wav
  - 0_01_40.wav
  - 0_01_8.wav
  - 0_01_79.wav
  - 0_01_5.wav
  - 0_01_81.wav
  - 0_01_92.wav
  - 0_01_1.wav
  - 0_01_21.wav
  - 0_01_28.wav
  - 0_01_3.wav
  - 0_01_45.wav
  - 0_01_46.wav
  - 0_01_33.wav
  - 0_01_89.wav
  - 0_01_83.wav
  - 0_01_78.wav
  - 0_01_0.wav
  - 0_01_9.wav
  - 0_01_10.wav
  - 0_01_23.wav
  - 0_01_60.wav
  - 0_01_43.wav
  - 0_01_64.wav
  - 0_01_4.wav
  - 0_01_2.wav
  - 0_01_63.wav
  - 0_01_99.wav
  - 0_01_68.wav
  - 0_01_20.wav
  - 0_01_38.wav
  - 0_01_14.wav
  - 0_01_37.wav
  - 0_01_48.wav
  - 0_01_77.wav
  - 0_01_96.wav
  - 0_01_31.wav
  - 0_01_67.wav
  - 0_01_11.wav
  - 0_01_59.wav
  - 0_01_6.wav
  - 0_01_73.wav
  - 0_01_25.wav
  - 0_01_47.wav
  - 0_01_95.wav
  - 0_01_74.wav
  - 0_01_86.wav
  - 0_01_57.wav
  - 0_01_13.wav
  - 0_01_39.wav
  - 0_01_12.wav
  - 0_01_50.wav
  - 0_01_61.wav
  - 0_01_75.wav
  - 0_01_90.wav
  - 0_01_49.wav
  - 0_01_16.wav
  - 0_01_56.wav
  - 0_01_80.wav
  - 0_01_19.wav
  - 0_01_71.wav
  - 0_01_51.wav
Directory: /content/data/03
  - 0_03_71.wav
  - 0_03_85.wav
  - 0_03_63.wav
  - 0_03_68.wav
  - 0_03_12.wav
  - 0_03_80.wav
  - 0_03_61.wav
  - 0_03_50.wav
  - 0_03_44.wav
  - 0_03_66.wav
  - 0_03_7.wav
  - 0_03_51.wav
  - 0_03_36.wav
  - 0_03_26.wav
  - 0_03_14.wav
  - 0_03_79.wav
  - 0_03_98.wav
  - 0_03_52.wav
  - 0_03_49.wav
  - 0_03_4.wav
  - 0_03_21.wav
  - 0_03_5.wav
  - 0_03_32.wav
  - 0_03_55.wav
  - 0_03_62.wav
  - 0_03_6.wav
  - 0_03_96.wav
  - 0_03_20.wav
  - 0_03_11.wav
  - 0_03_47.wav
  - 0_03_89.wav
  - 0_03_54.wav
  - 0_03_29.wav
  - 0_03_90.wav
  - 0_03_81.wav
  - 0_03_69.wav
  - 0_03_83.wav
  - 0_03_59.wav
  - 0_03_17.wav
  - 0_03_34.wav
  - 0_03_0.wav
  - 0_03_76.wav
  - 0_03_30.wav
  - 0_03_75.wav
  - 0_03_1.wav
  - 0_03_18.wav
  - 0_03_99.wav
  - 0_03_58.wav
  - 0_03_2.wav
  - 0_03_57.wav
  - 0_03_92.wav
  - 0_03_24.wav
  - 0_03_93.wav
  - 0_03_73.wav
  - 0_03_25.wav
  - 0_03_91.wav
  - 0_03_27.wav
  - 0_03_10.wav
  - 0_03_77.wav
  - 0_03_78.wav
  - 0_03_31.wav
  - 0_03_72.wav
  - 0_03_48.wav
  - 0_03_3.wav
  - 0_03_28.wav
  - 0_03_67.wav
  - 0_03_65.wav
  - 0_03_22.wav
  - 0_03_16.wav
  - 0_03_9.wav
  - 0_03_45.wav
  - 0_03_60.wav
  - 0_03_40.wav
  - 0_03_94.wav
  - 0_03_97.wav
  - 0_03_82.wav
  - 0_03_46.wav
  - 0_03_74.wav
  - 0_03_95.wav
  - 0_03_87.wav
  - 0_03_15.wav
  - 0_03_53.wav
  - 0_03_19.wav
  - 0_03_38.wav
  - 0_03_8.wav
  - 0_03_43.wav
  - 0_03_39.wav
  - 0_03_35.wav
  - 0_03_23.wav
  - 0_03_84.wav
  - 0_03_13.wav
  - 0_03_86.wav
  - 0_03_70.wav
  - 0_03_41.wav
  - 0_03_88.wav
  - 0_03_64.wav
  - 0_03_42.wav
  - 0_03_56.wav
  - 0_03_33.wav
  - 0_03_37.wav
Directory: /content/data/10
  - 0_10_98.wav
  - 0_10_0.wav
  - 0_10_54.wav
  - 0_10_87.wav
  - 0_10_95.wav
  - 0_10_5.wav
  - 0_10_71.wav
  - 0_10_85.wav
  - 0_10_53.wav
  - 0_10_19.wav
  - 0_10_33.wav
  - 0_10_58.wav
  - 0_10_55.wav
  - 0_10_21.wav
  - 0_10_52.wav
  - 0_10_81.wav
  - 0_10_4.wav
  - 0_10_6.wav
  - 0_10_89.wav
  - 0_10_23.wav
  - 0_10_10.wav
  - 0_10_51.wav
  - 0_10_46.wav
  - 0_10_66.wav
  - 0_10_28.wav
  - 0_10_32.wav
  - 0_10_79.wav
  - 0_10_14.wav
  - 0_10_92.wav
  - 0_10_60.wav
  - 0_10_59.wav
  - 0_10_61.wav
  - 0_10_56.wav
  - 0_10_16.wav
  - 0_10_8.wav
  - 0_10_3.wav
  - 0_10_50.wav
  - 0_10_62.wav
  - 0_10_69.wav
  - 0_10_34.wav
  - 0_10_73.wav
  - 0_10_77.wav
  - 0_10_82.wav
  - 0_10_2.wav
  - 0_10_11.wav
  - 0_10_41.wav
  - 0_10_88.wav
  - 0_10_63.wav
  - 0_10_26.wav
  - 0_10_37.wav
  - 0_10_86.wav
  - 0_10_24.wav
  - 0_10_76.wav
  - 0_10_57.wav
  - 0_10_93.wav
  - 0_10_1.wav
  - 0_10_74.wav
  - 0_10_75.wav
  - 0_10_22.wav
  - 0_10_18.wav
  - 0_10_38.wav
  - 0_10_90.wav
  - 0_10_44.wav
  - 0_10_78.wav
  - 0_10_43.wav
  - 0_10_72.wav
  - 0_10_67.wav
  - 0_10_94.wav
  - 0_10_27.wav
  - 0_10_83.wav
  - 0_10_36.wav
  - 0_10_17.wav
  - 0_10_40.wav
  - 0_10_20.wav
  - 0_10_30.wav
  - 0_10_13.wav
  - 0_10_49.wav
  - 0_10_65.wav
  - 0_10_29.wav
  - 0_10_12.wav
  - 0_10_45.wav
  - 0_10_7.wav
  - 0_10_97.wav
  - 0_10_31.wav
  - 0_10_42.wav
  - 0_10_15.wav
  - 0_10_39.wav
  - 0_10_64.wav
  - 0_10_35.wav
  - 0_10_70.wav
  - 0_10_48.wav
  - 0_10_84.wav
  - 0_10_9.wav
  - 0_10_25.wav
  - 0_10_47.wav
  - 0_10_91.wav
  - 0_10_80.wav
  - 0_10_68.wav
  - 0_10_96.wav
  - 0_10_99.wav
Directory: /content/data/05
  - 0_05_98.wav
  - 0_05_52.wav
  - 0_05_36.wav
  - 0_05_1.wav
  - 0_05_83.wav
  - 0_05_10.wav
  - 0_05_81.wav
  - 0_05_19.wav
  - 0_05_34.wav
  - 0_05_80.wav
  - 0_05_30.wav
  - 0_05_22.wav
  - 0_05_77.wav
  - 0_05_76.wav
  - 0_05_17.wav
  - 0_05_27.wav
  - 0_05_74.wav
  - 0_05_3.wav
  - 0_05_44.wav
  - 0_05_69.wav
  - 0_05_21.wav
  - 0_05_28.wav
  - 0_05_35.wav
  - 0_05_73.wav
  - 0_05_61.wav
  - 0_05_86.wav
  - 0_05_2.wav
  - 0_05_8.wav
  - 0_05_12.wav
  - 0_05_5.wav
  - 0_05_78.wav
  - 0_05_25.wav
  - 0_05_42.wav
  - 0_05_99.wav
  - 0_05_15.wav
  - 0_05_92.wav
  - 0_05_63.wav
  - 0_05_85.wav
  - 0_05_82.wav
  - 0_05_48.wav
  - 0_05_4.wav
  - 0_05_20.wav
  - 0_05_94.wav
  - 0_05_58.wav
  - 0_05_43.wav
  - 0_05_96.wav
  - 0_05_66.wav
  - 0_05_54.wav
  - 0_05_57.wav
  - 0_05_87.wav
  - 0_05_53.wav
  - 0_05_90.wav
  - 0_05_9.wav
  - 0_05_47.wav
  - 0_05_68.wav
  - 0_05_18.wav
  - 0_05_62.wav
  - 0_05_55.wav
  - 0_05_70.wav
  - 0_05_91.wav
  - 0_05_59.wav
  - 0_05_24.wav
  - 0_05_32.wav
  - 0_05_65.wav
  - 0_05_95.wav
  - 0_05_6.wav
  - 0_05_16.wav
  - 0_05_0.wav
  - 0_05_13.wav
  - 0_05_75.wav
  - 0_05_84.wav
  - 0_05_97.wav
  - 0_05_88.wav
  - 0_05_60.wav
  - 0_05_23.wav
  - 0_05_14.wav
  - 0_05_26.wav
  - 0_05_64.wav
  - 0_05_51.wav
  - 0_05_29.wav
  - 0_05_38.wav
  - 0_05_40.wav
  - 0_05_79.wav
  - 0_05_71.wav
  - 0_05_67.wav
  - 0_05_39.wav
  - 0_05_45.wav
  - 0_05_33.wav
  - 0_05_37.wav
  - 0_05_7.wav
  - 0_05_72.wav
  - 0_05_89.wav
  - 0_05_56.wav
  - 0_05_11.wav
  - 0_05_41.wav
  - 0_05_93.wav
  - 0_05_46.wav
  - 0_05_49.wav
  - 0_05_31.wav
  - 0_05_50.wav

Extract

In [8]:
# import zipfile
# import os

# zip_path = "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/archive.zip"
# extract_path = "/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification"

# # Extract if not already extracted
# if not os.path.exists(os.path.join(extract_path, "data")):
#     with zipfile.ZipFile(zip_path, 'r') as zip_ref:
#         zip_ref.extractall(extract_path)
#     print("✅ Extraction Complete!")
# else:
#     print("⚠️ Files already extracted.")

Verify Extraction

In [9]:
# import os
# print("📂 Extracted Folders:", os.listdir(extract_path))

Audio Samples of spoken digits (0-9) of 50 different speakers.

functions used to create the get_audio() function

  • .wav: .wav is a file format like .csv which stores the raw audio format. We will load the .wav file using the librosa package.
  • dsp.waveshow(): It visualizes the waveform in the time domain. This method creates a plot that alternates between a raw samples-based view of the signal and an amplitude-envelope view of the signal. The "sr" parameter is the sampling rate, i.e., samples per second.
  • Audio(): From the Ipython package, we can create an audio object.

Audio Samples of spoken digits (0-9) of 50 different speakers.

functions used to create the get_audio() function

  • .wav: .wav is a file format like .csv which stores the raw audio format. We will load the .wav file using the librosa package.
  • dsp.waveshow(): It visualizes the waveform in the time domain. This method creates a plot that alternates between a raw samples-based view of the signal and an amplitude-envelope view of the signal. The "sr" parameter is the sampling rate, i.e., samples per second.
  • Audio(): From the Ipython package, we can create an audio object.
In [18]:
def get_audio(digit=0):
    root_dir = "/content/data/"
    available_folders = [f for f in os.listdir(root_dir) if os.path.isdir(os.path.join(root_dir, f))]

    if not available_folders:
        print(f"⚠️ No folders found in {root_dir}")
        return None

    # Pick a random folder
    sample_folder = np.random.choice(available_folders)
    folder_path = os.path.join(root_dir, sample_folder)

    # List available files for the chosen digit
    available_files = [f for f in os.listdir(folder_path) if f.split("_")[2].startswith(str(digit))]

    if not available_files:
        print(f"⚠️ No files found for digit {digit} in {folder_path}")
        return None

    # Pick a random file
    file_name = np.random.choice(available_files)
    file_path = os.path.join(folder_path, file_name)

    # Load and display audio
    data, sample_rate = librosa.load(file_path, sr=22050)
    librosa.display.waveshow(data, sr=sample_rate)
    plt.show()

    #return dsp.Audio(data=data, rate=sample_rate)

    import IPython.display as ipd
    return ipd.Audio(data, rate=sample_rate)
In [19]:
# Show the audio and plot of digit 0
get_audio(0)
No description has been provided for this image
Out[19]:
Your browser does not support the audio element.
In [20]:
# Show the audio and plot of digit 1
get_audio(1)
No description has been provided for this image
Out[20]:
Your browser does not support the audio element.
In [21]:
# Show the audio and plot of digit 2
get_audio(2)
No description has been provided for this image
Out[21]:
Your browser does not support the audio element.
In [22]:
# Show the audio and plot of digit 9
get_audio(9)
No description has been provided for this image
Out[22]:
Your browser does not support the audio element.

Visualizing the spectrogram of the audio data¶

  • A spectrogram is a visual way of representing the signal strength or “loudness” of a signal over time at various frequencies or time steps present in a particular waveform. A spectrogram gives a detailed view of audio. It represents amplitude, frequency, and time in a single plot. Since spectrograms are continuous plots, they can be interpreted as an image. Different spectrograms have different attributes on their axes and they are usually different to interpret. In a Research and Development scenario, we make use of a vocoder, which is an encoder that converts spectrograms back to audio using parameters learned by machine learning. One great vocoder is the WaveNet vocoder which is used in almost all Text to Speech architectures.
In [23]:
def get_audio_raw(digit=0):
    root_dir = "/content/data/"

    # Get all available folders in /content/data/
    available_folders = sorted(os.listdir(root_dir))

    # Pick a random folder
    sample_folder = np.random.choice(available_folders)
    folder_path = os.path.join(root_dir, sample_folder)

    # Get all files in the selected folder
    available_files = [f for f in os.listdir(folder_path) if f.endswith('.wav')]

    # Filter files that match the digit (third element in the filename)
    digit_files = [f for f in available_files if f.split("_")[2].startswith(str(digit))]

    if not digit_files:
        print(f"⚠️ No files found for digit {digit} in {folder_path}")
        return None, None  # Return None to avoid errors

    # Pick a random file
    file_name = np.random.choice(digit_files)
    file_path = os.path.join(folder_path, file_name)

    # Load audio only if the file exists
    if not os.path.exists(file_path):
        print(f"⚠️ File not found: {file_path}")
        return None, None  # Avoid loading a missing file

    # Load audio
    audio, sample_rate = librosa.load(file_path, sr=22050)

    return audio, sample_rate

Extracting features from the audio file

  • Mel-frequency cepstral coefficients (MFCCs) Feature Extraction

  • MFCCs are usually the final features used in many machine learning models trained on audio data. They are usually a set of mel coefficients defined for each time step through which the raw audio data can be encoded. So for example, if we have an audio sample extending for 30 time steps, and we are defining each time step by 40 Mel Coefficients, our entire sample can be represented by 40 * 30 Mel Coefficients. And if we want to create a Mel Spectrogram out of it, our spectrogram will resemble a 2-D array of 40 horizontal rows and 30 vertical columns.

  • In this time step, we will first extract the Mel Coefficents for each audio file and add them to our dataset.

    • extract_features : Returns the MFCC extracted features for an audio file.
    • process_and_create_dataset : Iterate through the audio of each digit, extract the features using the extract_features() function, and append the data into a DataFrame.

Creating a function that extracts the data from audio files

In [24]:
# Function to extract MFCC features from an audio file
def extract_features(file):
    try:
        # Load audio and its sample rate
        audio, sample_rate = librosa.load(file, sr=22050)

        # Extract MFCC features
        extracted_features = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)

        # Scale the extracted features (Mean pooling)
        extracted_features = np.mean(extracted_features.T, axis=0)

        return extracted_features

    except Exception as e:
        print(f"⚠️ Error processing {file}: {e}")
        return None  # Return None if error occurs

# Function to preprocess the dataset
def preprocess_and_create_dataset():
    root_folder_path = "/content/data/"
    dataset = []

    # Iterate through folders (01-10)
    for folder in tqdm(range(1, 11)):

        print(f'\nProcessing folder: {folder}')

        # Ensure folder names are formatted as '01', '02', ..., '10'
        folder_name = f"{folder:02d}"
        folder_path = os.path.join(root_folder_path, folder_name)

        # Ensure folder exists
        if not os.path.exists(folder_path):
            print(f"⚠️ Skipping missing folder: {folder_path}")
            continue

        # Iterate through files in the folder
        for file in tqdm(os.listdir(folder_path)):
            abs_file_path = os.path.join(folder_path, file)

            # Extract features
            extracted_features = extract_features(abs_file_path)

            # Skip if feature extraction failed
            if extracted_features is None:
                continue

            # Extract class label (corrected)
            try:
                class_label = int(file.split("_")[1])  # Extracts second element (digit class)
            except ValueError:
                print(f"⚠️ Skipping file {file} due to incorrect format.")
                continue

            # Append to dataset
            dataset.append([extracted_features, class_label])

    # Convert dataset to DataFrame
    df = pd.DataFrame(dataset, columns=['features', 'class'])

    # Convert 'features' column to a NumPy array for efficiency
    df['features'] = df['features'].apply(lambda x: np.array(x))

    return df

Create the dataset using the defined function

preprocess_and_create_dataset:" step by step for a single file

In [25]:
# # Set folder number
# root_folder_path = "/content/data/"
# folder = os.path.join(root_folder_path, "0" + str(1))
In [26]:
# folder
In [27]:
# # Set path
# file = os.listdir(folder)[0]
# abs_file_path = os.path.join(folder, file)
In [28]:
# abs_file_path
In [29]:
# # Extract features using mel-frequency coefficient
# audio, sample_rate = librosa.load(abs_file_path)

# extracted_features = librosa.feature.mfcc(y = audio, sr = sample_rate, n_mfcc = 40)
In [30]:
# audio.shape, sample_rate
In [31]:
# print(f'This audio file last {audio.shape[0]/sample_rate} seconds')
In [32]:
# extracted_features.shape
#   # n_mfcc = 40)
#   # n_frames: column
In [33]:
# # Increase the printed number of columns.
# np.set_printoptions(linewidth=150)
In [34]:
# # Scale the extracted features
# extracted_features = np.mean(extracted_features.T, axis = 0)
In [35]:
# np.set_printoptions(linewidth=100)
# extracted_features
In [36]:
# extracted_features.shape
In [37]:
# # Class label
# class_label = file[0]
In [38]:
# class_label
In [39]:
# dataset = []

# # Append a list where the feature represents a column and class of the digit represents another column
# dataset.append([extracted_features, class_label])
In [40]:
# dataset[0][0]
In [41]:
# dataset[0][1]

In [42]:
# %%time
# # Create the dataset by calling the function
# dataset = preprocess_and_create_dataset()
In [43]:
dataset = preprocess_and_create_dataset()
print(dataset.head())
print(dataset['class'].value_counts())  # Check if multiple classes are detected correctly
  0%|          | 0/10 [00:00<?, ?it/s]
Processing folder: 1
  0%|          | 0/100 [00:00<?, ?it/s]
  1%|          | 1/100 [00:02<03:34,  2.16s/it]
  3%|▎         | 3/100 [00:02<00:59,  1.62it/s]
  5%|▌         | 5/100 [00:02<00:31,  2.97it/s]
  7%|▋         | 7/100 [00:02<00:20,  4.43it/s]
  9%|▉         | 9/100 [00:02<00:15,  6.02it/s]
 11%|█         | 11/100 [00:02<00:11,  7.55it/s]
 13%|█▎        | 13/100 [00:03<00:09,  8.96it/s]
 15%|█▌        | 15/100 [00:03<00:08,  9.82it/s]
 17%|█▋        | 17/100 [00:03<00:07, 11.00it/s]
 19%|█▉        | 19/100 [00:03<00:06, 11.67it/s]
 21%|██        | 21/100 [00:03<00:06, 12.46it/s]
 23%|██▎       | 23/100 [00:03<00:05, 13.03it/s]
 25%|██▌       | 25/100 [00:03<00:05, 12.81it/s]
 27%|██▋       | 27/100 [00:04<00:05, 13.37it/s]
 29%|██▉       | 29/100 [00:04<00:05, 12.29it/s]
 31%|███       | 31/100 [00:04<00:05, 12.90it/s]
 33%|███▎      | 33/100 [00:04<00:05, 13.27it/s]
 35%|███▌      | 35/100 [00:04<00:04, 13.68it/s]
 37%|███▋      | 37/100 [00:04<00:04, 14.05it/s]
 39%|███▉      | 39/100 [00:04<00:04, 12.93it/s]
 41%|████      | 41/100 [00:05<00:04, 13.61it/s]
 43%|████▎     | 43/100 [00:05<00:04, 13.26it/s]
 45%|████▌     | 45/100 [00:05<00:03, 13.83it/s]
 47%|████▋     | 47/100 [00:05<00:03, 14.12it/s]
 49%|████▉     | 49/100 [00:05<00:03, 14.19it/s]
 51%|█████     | 51/100 [00:05<00:03, 14.39it/s]
 53%|█████▎    | 53/100 [00:05<00:03, 14.12it/s]
 55%|█████▌    | 55/100 [00:06<00:03, 14.30it/s]
 57%|█████▋    | 57/100 [00:06<00:02, 14.37it/s]
 59%|█████▉    | 59/100 [00:06<00:03, 13.65it/s]
 61%|██████    | 61/100 [00:06<00:02, 13.88it/s]
 63%|██████▎   | 63/100 [00:06<00:02, 13.97it/s]
 65%|██████▌   | 65/100 [00:06<00:02, 14.01it/s]
 67%|██████▋   | 67/100 [00:06<00:02, 13.96it/s]
 69%|██████▉   | 69/100 [00:07<00:02, 14.00it/s]
 71%|███████   | 71/100 [00:07<00:02, 14.22it/s]
 73%|███████▎  | 73/100 [00:07<00:01, 13.56it/s]
 75%|███████▌  | 75/100 [00:07<00:01, 13.80it/s]
 77%|███████▋  | 77/100 [00:07<00:01, 14.04it/s]
 79%|███████▉  | 79/100 [00:07<00:01, 14.31it/s]
 81%|████████  | 81/100 [00:07<00:01, 13.67it/s]
 83%|████████▎ | 83/100 [00:08<00:01, 14.11it/s]
 85%|████████▌ | 85/100 [00:08<00:01, 14.41it/s]
 87%|████████▋ | 87/100 [00:08<00:00, 14.55it/s]
 89%|████████▉ | 89/100 [00:08<00:00, 14.23it/s]
 91%|█████████ | 91/100 [00:08<00:00, 14.22it/s]
 93%|█████████▎| 93/100 [00:08<00:00, 14.13it/s]
 95%|█████████▌| 95/100 [00:08<00:00, 13.88it/s]
 97%|█████████▋| 97/100 [00:09<00:00, 14.26it/s]
100%|██████████| 100/100 [00:09<00:00, 10.79it/s]
 10%|█         | 1/10 [00:09<01:23,  9.27s/it]
Processing folder: 2
  0%|          | 0/100 [00:00<?, ?it/s]
  2%|▏         | 2/100 [00:00<00:06, 14.44it/s]
  4%|▍         | 4/100 [00:00<00:06, 14.43it/s]
  6%|▌         | 6/100 [00:00<00:06, 14.30it/s]
  8%|▊         | 8/100 [00:00<00:07, 12.91it/s]
 10%|█         | 10/100 [00:00<00:06, 13.26it/s]
 12%|█▏        | 12/100 [00:00<00:06, 13.89it/s]
 14%|█▍        | 14/100 [00:01<00:05, 14.34it/s]
 16%|█▌        | 16/100 [00:01<00:05, 14.53it/s]
 18%|█▊        | 18/100 [00:01<00:05, 14.32it/s]
 20%|██        | 20/100 [00:01<00:05, 14.14it/s]
 22%|██▏       | 22/100 [00:01<00:05, 14.12it/s]
 24%|██▍       | 24/100 [00:01<00:05, 13.91it/s]
 26%|██▌       | 26/100 [00:01<00:05, 14.11it/s]
 28%|██▊       | 28/100 [00:02<00:05, 13.07it/s]
 30%|███       | 30/100 [00:02<00:06, 11.30it/s]
 32%|███▏      | 32/100 [00:02<00:06, 11.05it/s]
 34%|███▍      | 34/100 [00:02<00:06, 10.32it/s]
 36%|███▌      | 36/100 [00:02<00:06,  9.26it/s]
 37%|███▋      | 37/100 [00:03<00:07,  8.79it/s]
 38%|███▊      | 38/100 [00:03<00:07,  8.29it/s]
 39%|███▉      | 39/100 [00:03<00:07,  8.04it/s]
 40%|████      | 40/100 [00:03<00:07,  7.75it/s]
 41%|████      | 41/100 [00:03<00:07,  7.54it/s]
 42%|████▏     | 42/100 [00:03<00:07,  7.92it/s]
 43%|████▎     | 43/100 [00:03<00:06,  8.18it/s]
 45%|████▌     | 45/100 [00:04<00:06,  9.12it/s]
 46%|████▌     | 46/100 [00:04<00:06,  8.93it/s]
 48%|████▊     | 48/100 [00:04<00:05,  9.03it/s]
 49%|████▉     | 49/100 [00:04<00:05,  8.91it/s]
 50%|█████     | 50/100 [00:04<00:05,  8.65it/s]
 52%|█████▏    | 52/100 [00:04<00:05,  8.57it/s]
 54%|█████▍    | 54/100 [00:05<00:05,  8.86it/s]
 55%|█████▌    | 55/100 [00:05<00:05,  8.33it/s]
 56%|█████▌    | 56/100 [00:05<00:05,  7.93it/s]
 57%|█████▋    | 57/100 [00:05<00:05,  7.54it/s]
 59%|█████▉    | 59/100 [00:05<00:04,  8.65it/s]
 60%|██████    | 60/100 [00:05<00:04,  8.24it/s]
 61%|██████    | 61/100 [00:05<00:04,  8.23it/s]
 62%|██████▏   | 62/100 [00:06<00:04,  8.08it/s]
 64%|██████▍   | 64/100 [00:06<00:04,  8.53it/s]
 65%|██████▌   | 65/100 [00:06<00:04,  8.12it/s]
 66%|██████▌   | 66/100 [00:06<00:04,  7.84it/s]
 67%|██████▋   | 67/100 [00:06<00:04,  8.20it/s]
 69%|██████▉   | 69/100 [00:06<00:03,  8.16it/s]
 70%|███████   | 70/100 [00:07<00:03,  7.78it/s]
 71%|███████   | 71/100 [00:07<00:03,  8.22it/s]
 73%|███████▎  | 73/100 [00:07<00:02,  9.94it/s]
 75%|███████▌  | 75/100 [00:07<00:02, 11.36it/s]
 77%|███████▋  | 77/100 [00:07<00:02, 11.42it/s]
 79%|███████▉  | 79/100 [00:07<00:01, 12.13it/s]
 81%|████████  | 81/100 [00:07<00:01, 12.86it/s]
 83%|████████▎ | 83/100 [00:08<00:01, 13.41it/s]
 85%|████████▌ | 85/100 [00:08<00:01, 13.85it/s]
 87%|████████▋ | 87/100 [00:08<00:00, 13.94it/s]
 89%|████████▉ | 89/100 [00:08<00:00, 14.21it/s]
 91%|█████████ | 91/100 [00:08<00:00, 13.35it/s]
 93%|█████████▎| 93/100 [00:08<00:00, 13.73it/s]
 95%|█████████▌| 95/100 [00:08<00:00, 13.51it/s]
 97%|█████████▋| 97/100 [00:09<00:00, 13.82it/s]
100%|██████████| 100/100 [00:09<00:00, 10.74it/s]
 20%|██        | 2/10 [00:18<01:14,  9.31s/it]
Processing folder: 3
  0%|          | 0/100 [00:00<?, ?it/s]
  2%|▏         | 2/100 [00:00<00:06, 14.79it/s]
  4%|▍         | 4/100 [00:00<00:07, 13.45it/s]
  6%|▌         | 6/100 [00:00<00:07, 13.25it/s]
  8%|▊         | 8/100 [00:00<00:06, 13.92it/s]
 10%|█         | 10/100 [00:00<00:06, 14.26it/s]
 12%|█▏        | 12/100 [00:00<00:06, 14.34it/s]
 14%|█▍        | 14/100 [00:00<00:05, 14.54it/s]
 16%|█▌        | 16/100 [00:01<00:05, 14.46it/s]
 18%|█▊        | 18/100 [00:01<00:05, 14.37it/s]
 20%|██        | 20/100 [00:01<00:05, 13.89it/s]
 22%|██▏       | 22/100 [00:01<00:05, 14.23it/s]
 24%|██▍       | 24/100 [00:01<00:05, 14.56it/s]
 26%|██▌       | 26/100 [00:01<00:05, 14.71it/s]
 28%|██▊       | 28/100 [00:01<00:04, 14.79it/s]
 30%|███       | 30/100 [00:02<00:04, 14.67it/s]
 32%|███▏      | 32/100 [00:02<00:04, 14.74it/s]
 34%|███▍      | 34/100 [00:02<00:04, 14.49it/s]
 36%|███▌      | 36/100 [00:02<00:04, 13.68it/s]
 38%|███▊      | 38/100 [00:02<00:04, 14.07it/s]
 40%|████      | 40/100 [00:02<00:04, 14.40it/s]
 42%|████▏     | 42/100 [00:02<00:04, 14.39it/s]
 44%|████▍     | 44/100 [00:03<00:03, 14.47it/s]
 46%|████▌     | 46/100 [00:03<00:03, 13.80it/s]
 48%|████▊     | 48/100 [00:03<00:03, 14.04it/s]
 50%|█████     | 50/100 [00:03<00:03, 13.97it/s]
 52%|█████▏    | 52/100 [00:03<00:03, 14.18it/s]
 54%|█████▍    | 54/100 [00:03<00:03, 14.30it/s]
 56%|█████▌    | 56/100 [00:03<00:03, 14.22it/s]
 58%|█████▊    | 58/100 [00:04<00:02, 14.18it/s]
 60%|██████    | 60/100 [00:04<00:02, 13.91it/s]
 62%|██████▏   | 62/100 [00:04<00:02, 13.78it/s]
 64%|██████▍   | 64/100 [00:04<00:02, 13.44it/s]
 66%|██████▌   | 66/100 [00:04<00:02, 13.73it/s]
 68%|██████▊   | 68/100 [00:04<00:02, 14.01it/s]
 70%|███████   | 70/100 [00:04<00:02, 14.19it/s]
 72%|███████▏  | 72/100 [00:05<00:02, 13.98it/s]
 74%|███████▍  | 74/100 [00:05<00:01, 13.98it/s]
 76%|███████▌  | 76/100 [00:05<00:01, 13.94it/s]
 78%|███████▊  | 78/100 [00:05<00:01, 13.85it/s]
 80%|████████  | 80/100 [00:05<00:01, 13.56it/s]
 82%|████████▏ | 82/100 [00:05<00:01, 13.99it/s]
 84%|████████▍ | 84/100 [00:05<00:01, 14.06it/s]
 86%|████████▌ | 86/100 [00:06<00:00, 14.30it/s]
 88%|████████▊ | 88/100 [00:06<00:00, 14.11it/s]
 90%|█████████ | 90/100 [00:06<00:00, 14.22it/s]
 92%|█████████▏| 92/100 [00:06<00:00, 14.32it/s]
 94%|█████████▍| 94/100 [00:06<00:00, 14.15it/s]
 96%|█████████▌| 96/100 [00:06<00:00, 14.31it/s]
 98%|█████████▊| 98/100 [00:06<00:00, 14.32it/s]
100%|██████████| 100/100 [00:07<00:00, 14.14it/s]
 30%|███       | 3/10 [00:25<00:58,  8.29s/it]
Processing folder: 4
  0%|          | 0/100 [00:00<?, ?it/s]
  2%|▏         | 2/100 [00:00<00:07, 13.63it/s]
  4%|▍         | 4/100 [00:00<00:06, 14.02it/s]
  6%|▌         | 6/100 [00:00<00:06, 14.40it/s]
  8%|▊         | 8/100 [00:00<00:06, 13.86it/s]
 10%|█         | 10/100 [00:00<00:06, 14.06it/s]
 12%|█▏        | 12/100 [00:00<00:06, 12.65it/s]
 14%|█▍        | 14/100 [00:01<00:07, 11.28it/s]
 16%|█▌        | 16/100 [00:01<00:07, 10.90it/s]
 18%|█▊        | 18/100 [00:01<00:08,  9.33it/s]
 19%|█▉        | 19/100 [00:01<00:09,  8.79it/s]
 20%|██        | 20/100 [00:01<00:09,  8.39it/s]
 21%|██        | 21/100 [00:02<00:10,  7.85it/s]
 22%|██▏       | 22/100 [00:02<00:10,  7.71it/s]
 24%|██▍       | 24/100 [00:02<00:08,  8.65it/s]
 25%|██▌       | 25/100 [00:02<00:09,  8.30it/s]
 26%|██▌       | 26/100 [00:02<00:09,  7.84it/s]
 27%|██▋       | 27/100 [00:02<00:09,  8.10it/s]
 28%|██▊       | 28/100 [00:02<00:08,  8.11it/s]
 29%|██▉       | 29/100 [00:03<00:09,  7.79it/s]
 30%|███       | 30/100 [00:03<00:09,  7.33it/s]
 31%|███       | 31/100 [00:03<00:09,  7.31it/s]
 32%|███▏      | 32/100 [00:03<00:09,  7.16it/s]
 33%|███▎      | 33/100 [00:03<00:09,  7.08it/s]
 34%|███▍      | 34/100 [00:03<00:09,  7.06it/s]
 35%|███▌      | 35/100 [00:03<00:09,  7.08it/s]
 36%|███▌      | 36/100 [00:04<00:08,  7.19it/s]
 37%|███▋      | 37/100 [00:04<00:08,  7.26it/s]
 38%|███▊      | 38/100 [00:04<00:08,  7.25it/s]
 39%|███▉      | 39/100 [00:04<00:08,  7.12it/s]
 40%|████      | 40/100 [00:04<00:08,  7.06it/s]
 41%|████      | 41/100 [00:04<00:08,  7.18it/s]
 42%|████▏     | 42/100 [00:04<00:07,  7.74it/s]
 43%|████▎     | 43/100 [00:04<00:07,  8.09it/s]
 44%|████▍     | 44/100 [00:05<00:07,  7.65it/s]
 45%|████▌     | 45/100 [00:05<00:07,  7.43it/s]
 46%|████▌     | 46/100 [00:05<00:07,  7.35it/s]
 47%|████▋     | 47/100 [00:05<00:07,  7.48it/s]
 48%|████▊     | 48/100 [00:05<00:07,  7.31it/s]
 49%|████▉     | 49/100 [00:05<00:07,  7.19it/s]
 50%|█████     | 50/100 [00:05<00:06,  7.26it/s]
 51%|█████     | 51/100 [00:06<00:06,  7.29it/s]
 53%|█████▎    | 53/100 [00:06<00:05,  8.46it/s]
 55%|█████▌    | 55/100 [00:06<00:04, 10.05it/s]
 57%|█████▋    | 57/100 [00:06<00:04,  9.03it/s]
 58%|█████▊    | 58/100 [00:06<00:04,  8.61it/s]
 60%|██████    | 60/100 [00:06<00:03, 10.04it/s]
 62%|██████▏   | 62/100 [00:07<00:03, 11.06it/s]
 64%|██████▍   | 64/100 [00:07<00:03, 11.53it/s]
 66%|██████▌   | 66/100 [00:07<00:03,  9.18it/s]
 68%|██████▊   | 68/100 [00:07<00:03, 10.30it/s]
 70%|███████   | 70/100 [00:07<00:02, 11.39it/s]
 72%|███████▏  | 72/100 [00:07<00:02, 12.23it/s]
 74%|███████▍  | 74/100 [00:08<00:02, 12.26it/s]
 76%|███████▌  | 76/100 [00:08<00:02,  9.55it/s]
 78%|███████▊  | 78/100 [00:08<00:02, 10.49it/s]
 80%|████████  | 80/100 [00:08<00:01, 11.40it/s]
 82%|████████▏ | 82/100 [00:08<00:01, 11.74it/s]
 84%|████████▍ | 84/100 [00:09<00:01, 12.18it/s]
 86%|████████▌ | 86/100 [00:09<00:01, 12.39it/s]
 88%|████████▊ | 88/100 [00:09<00:00, 13.05it/s]
 90%|█████████ | 90/100 [00:09<00:00, 13.60it/s]
 92%|█████████▏| 92/100 [00:09<00:00, 14.06it/s]
 94%|█████████▍| 94/100 [00:09<00:00, 14.27it/s]
 96%|█████████▌| 96/100 [00:09<00:00, 14.46it/s]
 98%|█████████▊| 98/100 [00:09<00:00, 14.66it/s]
100%|██████████| 100/100 [00:10<00:00,  9.86it/s]
 40%|████      | 4/10 [00:35<00:54,  9.03s/it]
Processing folder: 5
  0%|          | 0/100 [00:00<?, ?it/s]
  2%|▏         | 2/100 [00:00<00:06, 14.55it/s]
  4%|▍         | 4/100 [00:00<00:06, 13.94it/s]
  6%|▌         | 6/100 [00:00<00:06, 14.39it/s]
  8%|▊         | 8/100 [00:00<00:06, 14.63it/s]
 10%|█         | 10/100 [00:00<00:06, 14.67it/s]
 12%|█▏        | 12/100 [00:00<00:05, 14.72it/s]
 14%|█▍        | 14/100 [00:00<00:06, 14.25it/s]
 16%|█▌        | 16/100 [00:01<00:06, 13.81it/s]
 18%|█▊        | 18/100 [00:01<00:05, 14.18it/s]
 20%|██        | 20/100 [00:01<00:05, 14.40it/s]
 22%|██▏       | 22/100 [00:01<00:05, 14.59it/s]
 24%|██▍       | 24/100 [00:01<00:05, 14.73it/s]
 26%|██▌       | 26/100 [00:01<00:04, 14.84it/s]
 28%|██▊       | 28/100 [00:01<00:04, 14.82it/s]
 30%|███       | 30/100 [00:02<00:05, 13.80it/s]
 32%|███▏      | 32/100 [00:02<00:04, 13.91it/s]
 34%|███▍      | 34/100 [00:02<00:04, 14.06it/s]
 36%|███▌      | 36/100 [00:02<00:04, 14.20it/s]
 38%|███▊      | 38/100 [00:02<00:04, 14.26it/s]
 40%|████      | 40/100 [00:02<00:04, 14.16it/s]
 42%|████▏     | 42/100 [00:02<00:04, 14.23it/s]
 44%|████▍     | 44/100 [00:03<00:04, 13.11it/s]
 46%|████▌     | 46/100 [00:03<00:04, 13.29it/s]
 48%|████▊     | 48/100 [00:03<00:03, 13.60it/s]
 50%|█████     | 50/100 [00:03<00:03, 13.96it/s]
 52%|█████▏    | 52/100 [00:03<00:03, 14.32it/s]
 54%|█████▍    | 54/100 [00:03<00:03, 14.35it/s]
 56%|█████▌    | 56/100 [00:03<00:03, 14.49it/s]
 58%|█████▊    | 58/100 [00:04<00:03, 13.67it/s]
 60%|██████    | 60/100 [00:04<00:02, 13.45it/s]
 62%|██████▏   | 62/100 [00:04<00:02, 13.78it/s]
 64%|██████▍   | 64/100 [00:04<00:02, 14.02it/s]
 66%|██████▌   | 66/100 [00:04<00:02, 13.95it/s]
 68%|██████▊   | 68/100 [00:04<00:02, 14.13it/s]
 70%|███████   | 70/100 [00:04<00:02, 14.09it/s]
 72%|███████▏  | 72/100 [00:05<00:01, 14.02it/s]
 74%|███████▍  | 74/100 [00:05<00:01, 13.20it/s]
 76%|███████▌  | 76/100 [00:05<00:01, 13.71it/s]
 78%|███████▊  | 78/100 [00:05<00:01, 14.10it/s]
 80%|████████  | 80/100 [00:05<00:01, 14.36it/s]
 82%|████████▏ | 82/100 [00:05<00:01, 14.26it/s]
 84%|████████▍ | 84/100 [00:06<00:01, 12.75it/s]
 86%|████████▌ | 86/100 [00:06<00:01, 10.72it/s]
 88%|████████▊ | 88/100 [00:06<00:01,  9.57it/s]
 90%|█████████ | 90/100 [00:06<00:01,  9.59it/s]
 92%|█████████▏| 92/100 [00:07<00:00,  8.60it/s]
 93%|█████████▎| 93/100 [00:07<00:00,  8.31it/s]
 94%|█████████▍| 94/100 [00:07<00:00,  7.96it/s]
 95%|█████████▌| 95/100 [00:07<00:00,  7.66it/s]
 96%|█████████▌| 96/100 [00:07<00:00,  7.47it/s]
 97%|█████████▋| 97/100 [00:07<00:00,  7.73it/s]
 98%|█████████▊| 98/100 [00:07<00:00,  8.05it/s]
 99%|█████████▉| 99/100 [00:07<00:00,  8.11it/s]
100%|██████████| 100/100 [00:08<00:00, 12.35it/s]
 50%|█████     | 5/10 [00:43<00:43,  8.70s/it]
Processing folder: 6
  0%|          | 0/100 [00:00<?, ?it/s]
  1%|          | 1/100 [00:00<00:11,  8.50it/s]
  2%|▏         | 2/100 [00:00<00:11,  8.24it/s]
  4%|▍         | 4/100 [00:00<00:09,  9.71it/s]
  5%|▌         | 5/100 [00:00<00:10,  8.73it/s]
  6%|▌         | 6/100 [00:00<00:11,  7.95it/s]
  7%|▋         | 7/100 [00:00<00:12,  7.46it/s]
  8%|▊         | 8/100 [00:01<00:12,  7.20it/s]
 10%|█         | 10/100 [00:01<00:10,  8.95it/s]
 11%|█         | 11/100 [00:01<00:10,  8.32it/s]
 12%|█▏        | 12/100 [00:01<00:10,  8.08it/s]
 13%|█▎        | 13/100 [00:01<00:11,  7.85it/s]
 14%|█▍        | 14/100 [00:01<00:11,  7.69it/s]
 15%|█▌        | 15/100 [00:01<00:10,  8.10it/s]
 16%|█▌        | 16/100 [00:01<00:10,  8.06it/s]
 18%|█▊        | 18/100 [00:02<00:09,  8.25it/s]
 19%|█▉        | 19/100 [00:02<00:09,  8.48it/s]
 20%|██        | 20/100 [00:02<00:09,  8.23it/s]
 22%|██▏       | 22/100 [00:02<00:08,  9.41it/s]
 23%|██▎       | 23/100 [00:02<00:08,  9.28it/s]
 24%|██▍       | 24/100 [00:02<00:08,  8.46it/s]
 25%|██▌       | 25/100 [00:03<00:09,  8.01it/s]<ipython-input-24-55e539c04948>:5: UserWarning: PySoundFile failed. Trying audioread instead.
  audio, sample_rate = librosa.load(file, sr=22050)
/usr/local/lib/python3.11/dist-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)

 26%|██▌       | 26/100 [00:03<00:13,  5.51it/s]
 28%|██▊       | 28/100 [00:03<00:09,  7.69it/s]
⚠️ Error processing /content/data/06/0_06_50.wav: 
 30%|███       | 30/100 [00:03<00:07,  9.46it/s]
 32%|███▏      | 32/100 [00:03<00:06, 10.82it/s]
 34%|███▍      | 34/100 [00:03<00:05, 11.31it/s]
 36%|███▌      | 36/100 [00:04<00:05, 12.11it/s]
 38%|███▊      | 38/100 [00:04<00:04, 12.87it/s]
 40%|████      | 40/100 [00:04<00:04, 13.57it/s]
 42%|████▏     | 42/100 [00:04<00:04, 14.06it/s]
 44%|████▍     | 44/100 [00:04<00:04, 13.51it/s]
 46%|████▌     | 46/100 [00:04<00:03, 13.88it/s]
 48%|████▊     | 48/100 [00:04<00:03, 13.27it/s]
 50%|█████     | 50/100 [00:05<00:03, 13.53it/s]
 52%|█████▏    | 52/100 [00:05<00:03, 13.71it/s]
 54%|█████▍    | 54/100 [00:05<00:03, 14.02it/s]
 56%|█████▌    | 56/100 [00:05<00:03, 14.09it/s]
 58%|█████▊    | 58/100 [00:05<00:03, 13.69it/s]
 60%|██████    | 60/100 [00:05<00:02, 13.83it/s]
 62%|██████▏   | 62/100 [00:05<00:02, 13.14it/s]
 64%|██████▍   | 64/100 [00:06<00:02, 13.34it/s]
 66%|██████▌   | 66/100 [00:06<00:02, 13.59it/s]
 68%|██████▊   | 68/100 [00:06<00:02, 13.87it/s]
 70%|███████   | 70/100 [00:06<00:02, 14.16it/s]
 72%|███████▏  | 72/100 [00:06<00:02, 13.66it/s]
 74%|███████▍  | 74/100 [00:06<00:01, 13.97it/s]
 76%|███████▌  | 76/100 [00:06<00:01, 13.62it/s]
 78%|███████▊  | 78/100 [00:07<00:01, 14.04it/s]
 80%|████████  | 80/100 [00:07<00:01, 14.46it/s]
 82%|████████▏ | 82/100 [00:07<00:01, 14.77it/s]
 84%|████████▍ | 84/100 [00:07<00:01, 14.98it/s]
 86%|████████▌ | 86/100 [00:07<00:00, 15.09it/s]
 88%|████████▊ | 88/100 [00:07<00:00, 14.32it/s]
 90%|█████████ | 90/100 [00:07<00:00, 14.11it/s]
 92%|█████████▏| 92/100 [00:08<00:00, 14.22it/s]
 94%|█████████▍| 94/100 [00:08<00:00, 14.43it/s]
 96%|█████████▌| 96/100 [00:08<00:00, 14.64it/s]
 98%|█████████▊| 98/100 [00:08<00:00, 14.73it/s]
100%|██████████| 100/100 [00:08<00:00, 11.66it/s]
 60%|██████    | 6/10 [00:52<00:34,  8.66s/it]
Processing folder: 7
  0%|          | 0/100 [00:00<?, ?it/s]
  2%|▏         | 2/100 [00:00<00:07, 13.20it/s]
  4%|▍         | 4/100 [00:00<00:07, 12.42it/s]
  6%|▌         | 6/100 [00:00<00:07, 12.88it/s]
  8%|▊         | 8/100 [00:00<00:06, 13.46it/s]
 10%|█         | 10/100 [00:00<00:06, 13.65it/s]
 12%|█▏        | 12/100 [00:00<00:06, 13.87it/s]
 14%|█▍        | 14/100 [00:01<00:06, 13.94it/s]
 16%|█▌        | 16/100 [00:01<00:06, 13.23it/s]
 18%|█▊        | 18/100 [00:01<00:06, 12.48it/s]
 20%|██        | 20/100 [00:01<00:06, 12.67it/s]
 22%|██▏       | 22/100 [00:01<00:05, 13.34it/s]
 24%|██▍       | 24/100 [00:01<00:05, 13.89it/s]
 26%|██▌       | 26/100 [00:01<00:05, 14.33it/s]
 28%|██▊       | 28/100 [00:02<00:04, 14.55it/s]
 30%|███       | 30/100 [00:02<00:05, 13.69it/s]
 32%|███▏      | 32/100 [00:02<00:05, 13.18it/s]
 34%|███▍      | 34/100 [00:02<00:04, 13.59it/s]
 36%|███▌      | 36/100 [00:02<00:04, 13.94it/s]
 38%|███▊      | 38/100 [00:02<00:04, 14.34it/s]
 40%|████      | 40/100 [00:02<00:04, 14.60it/s]
 42%|████▏     | 42/100 [00:03<00:03, 14.56it/s]
 44%|████▍     | 44/100 [00:03<00:03, 14.73it/s]
 46%|████▌     | 46/100 [00:03<00:04, 13.20it/s]
 48%|████▊     | 48/100 [00:03<00:03, 13.67it/s]
 50%|█████     | 50/100 [00:03<00:03, 14.02it/s]
 52%|█████▏    | 52/100 [00:03<00:03, 14.27it/s]
 54%|█████▍    | 54/100 [00:03<00:03, 14.51it/s]
 56%|█████▌    | 56/100 [00:04<00:02, 14.67it/s]
 58%|█████▊    | 58/100 [00:04<00:02, 14.69it/s]
 60%|██████    | 60/100 [00:04<00:02, 13.90it/s]
 62%|██████▏   | 62/100 [00:04<00:02, 14.08it/s]
 64%|██████▍   | 64/100 [00:04<00:02, 12.18it/s]
 66%|██████▌   | 66/100 [00:04<00:03, 10.59it/s]
 68%|██████▊   | 68/100 [00:05<00:03,  9.06it/s]
 69%|██████▉   | 69/100 [00:05<00:03,  8.54it/s]
 70%|███████   | 70/100 [00:05<00:03,  8.12it/s]
 71%|███████   | 71/100 [00:05<00:03,  8.04it/s]
 72%|███████▏  | 72/100 [00:05<00:03,  7.89it/s]
 73%|███████▎  | 73/100 [00:05<00:03,  7.50it/s]
 74%|███████▍  | 74/100 [00:06<00:03,  7.42it/s]
 75%|███████▌  | 75/100 [00:06<00:03,  7.45it/s]
 76%|███████▌  | 76/100 [00:06<00:03,  7.49it/s]
 77%|███████▋  | 77/100 [00:06<00:02,  7.89it/s]
 78%|███████▊  | 78/100 [00:06<00:02,  8.31it/s]
 79%|███████▉  | 79/100 [00:06<00:02,  8.56it/s]
 80%|████████  | 80/100 [00:06<00:02,  8.08it/s]
 81%|████████  | 81/100 [00:06<00:02,  7.91it/s]
 82%|████████▏ | 82/100 [00:07<00:02,  7.59it/s]
 83%|████████▎ | 83/100 [00:07<00:02,  7.33it/s]
 84%|████████▍ | 84/100 [00:07<00:02,  7.37it/s]
 85%|████████▌ | 85/100 [00:07<00:01,  7.93it/s]
 86%|████████▌ | 86/100 [00:07<00:01,  8.45it/s]
 87%|████████▋ | 87/100 [00:07<00:01,  8.27it/s]
 88%|████████▊ | 88/100 [00:07<00:01,  8.15it/s]
 90%|█████████ | 90/100 [00:08<00:01,  8.26it/s]
 91%|█████████ | 91/100 [00:08<00:01,  7.91it/s]
 92%|█████████▏| 92/100 [00:08<00:01,  7.56it/s]
 93%|█████████▎| 93/100 [00:08<00:00,  7.33it/s]
 95%|█████████▌| 95/100 [00:08<00:00,  8.55it/s]
 96%|█████████▌| 96/100 [00:08<00:00,  8.01it/s]
 98%|█████████▊| 98/100 [00:09<00:00,  8.16it/s]
 99%|█████████▉| 99/100 [00:09<00:00,  7.89it/s]
100%|██████████| 100/100 [00:09<00:00, 10.67it/s]
 70%|███████   | 7/10 [01:01<00:26,  8.90s/it]
Processing folder: 8
  0%|          | 0/100 [00:00<?, ?it/s]
  1%|          | 1/100 [00:00<00:12,  7.85it/s]
  2%|▏         | 2/100 [00:00<00:13,  7.32it/s]
  3%|▎         | 3/100 [00:00<00:13,  6.99it/s]
  5%|▌         | 5/100 [00:00<00:09,  9.75it/s]
  7%|▋         | 7/100 [00:00<00:08, 11.28it/s]
  9%|▉         | 9/100 [00:00<00:07, 12.27it/s]
 11%|█         | 11/100 [00:01<00:07, 12.46it/s]
 13%|█▎        | 13/100 [00:01<00:06, 13.01it/s]
 15%|█▌        | 15/100 [00:01<00:06, 12.81it/s]
 17%|█▋        | 17/100 [00:01<00:06, 13.38it/s]
 19%|█▉        | 19/100 [00:01<00:05, 13.78it/s]
 21%|██        | 21/100 [00:01<00:05, 13.93it/s]
 23%|██▎       | 23/100 [00:01<00:05, 13.99it/s]
 25%|██▌       | 25/100 [00:02<00:05, 13.71it/s]
 27%|██▋       | 27/100 [00:02<00:05, 13.70it/s]
 29%|██▉       | 29/100 [00:02<00:05, 13.58it/s]
 31%|███       | 31/100 [00:02<00:05, 13.58it/s]
 33%|███▎      | 33/100 [00:02<00:04, 13.57it/s]
 35%|███▌      | 35/100 [00:02<00:04, 13.73it/s]
 37%|███▋      | 37/100 [00:02<00:04, 12.96it/s]
 39%|███▉      | 39/100 [00:03<00:04, 12.94it/s]
 41%|████      | 41/100 [00:03<00:04, 13.13it/s]
 43%|████▎     | 43/100 [00:03<00:04, 13.37it/s]
 45%|████▌     | 45/100 [00:03<00:04, 13.31it/s]
 47%|████▋     | 47/100 [00:03<00:03, 13.91it/s]
 49%|████▉     | 49/100 [00:03<00:03, 14.27it/s]
 51%|█████     | 51/100 [00:03<00:03, 13.75it/s]
 53%|█████▎    | 53/100 [00:04<00:03, 13.96it/s]
 55%|█████▌    | 55/100 [00:04<00:03, 14.08it/s]
 57%|█████▋    | 57/100 [00:04<00:03, 14.23it/s]
 59%|█████▉    | 59/100 [00:04<00:03, 13.50it/s]
 61%|██████    | 61/100 [00:04<00:02, 13.73it/s]
 63%|██████▎   | 63/100 [00:04<00:02, 13.77it/s]
 65%|██████▌   | 65/100 [00:04<00:02, 13.24it/s]
 67%|██████▋   | 67/100 [00:05<00:02, 13.74it/s]
 69%|██████▉   | 69/100 [00:05<00:02, 13.90it/s]
 71%|███████   | 71/100 [00:05<00:02, 14.07it/s]
 73%|███████▎  | 73/100 [00:05<00:01, 13.95it/s]
 75%|███████▌  | 75/100 [00:05<00:01, 13.95it/s]
 77%|███████▋  | 77/100 [00:05<00:01, 14.07it/s]
 79%|███████▉  | 79/100 [00:05<00:01, 14.00it/s]
 81%|████████  | 81/100 [00:06<00:01, 14.27it/s]
 83%|████████▎ | 83/100 [00:06<00:01, 14.38it/s]
 85%|████████▌ | 85/100 [00:06<00:01, 14.68it/s]
 87%|████████▋ | 87/100 [00:06<00:00, 14.89it/s]
 89%|████████▉ | 89/100 [00:06<00:00, 14.04it/s]
 91%|█████████ | 91/100 [00:06<00:00, 14.28it/s]
 93%|█████████▎| 93/100 [00:06<00:00, 13.92it/s]
 95%|█████████▌| 95/100 [00:07<00:00, 13.73it/s]
 97%|█████████▋| 97/100 [00:07<00:00, 13.79it/s]
100%|██████████| 100/100 [00:07<00:00, 13.48it/s]
 80%|████████  | 8/10 [01:09<00:16,  8.43s/it]
Processing folder: 9
  0%|          | 0/100 [00:00<?, ?it/s]
  2%|▏         | 2/100 [00:00<00:06, 15.15it/s]
  4%|▍         | 4/100 [00:00<00:06, 14.32it/s]
  6%|▌         | 6/100 [00:00<00:06, 14.22it/s]
  8%|▊         | 8/100 [00:00<00:06, 13.83it/s]
 10%|█         | 10/100 [00:00<00:06, 13.79it/s]
 12%|█▏        | 12/100 [00:00<00:06, 14.04it/s]
 14%|█▍        | 14/100 [00:00<00:06, 13.95it/s]
 16%|█▌        | 16/100 [00:01<00:05, 14.14it/s]
 18%|█▊        | 18/100 [00:01<00:06, 13.38it/s]
 20%|██        | 20/100 [00:01<00:05, 13.83it/s]
 22%|██▏       | 22/100 [00:01<00:05, 13.48it/s]
 24%|██▍       | 24/100 [00:01<00:05, 13.79it/s]
 26%|██▌       | 26/100 [00:01<00:05, 14.08it/s]
 28%|██▊       | 28/100 [00:01<00:05, 14.35it/s]
 30%|███       | 30/100 [00:02<00:04, 14.53it/s]
 32%|███▏      | 32/100 [00:02<00:05, 13.59it/s]
 34%|███▍      | 34/100 [00:02<00:04, 13.98it/s]
 36%|███▌      | 36/100 [00:02<00:04, 13.38it/s]
 38%|███▊      | 38/100 [00:02<00:04, 13.45it/s]
 40%|████      | 40/100 [00:02<00:04, 13.90it/s]
 42%|████▏     | 42/100 [00:03<00:04, 13.84it/s]
 44%|████▍     | 44/100 [00:03<00:04, 12.12it/s]
 46%|████▌     | 46/100 [00:03<00:05, 10.00it/s]
 48%|████▊     | 48/100 [00:03<00:05,  9.25it/s]
 50%|█████     | 50/100 [00:04<00:05,  8.46it/s]
 51%|█████     | 51/100 [00:04<00:05,  8.23it/s]
 53%|█████▎    | 53/100 [00:04<00:05,  8.56it/s]
 54%|█████▍    | 54/100 [00:04<00:05,  8.31it/s]
 55%|█████▌    | 55/100 [00:04<00:05,  7.95it/s]
 56%|█████▌    | 56/100 [00:04<00:05,  7.81it/s]
 57%|█████▋    | 57/100 [00:04<00:05,  7.90it/s]
 59%|█████▉    | 59/100 [00:05<00:04,  8.92it/s]
 60%|██████    | 60/100 [00:05<00:04,  8.77it/s]
 61%|██████    | 61/100 [00:05<00:04,  8.37it/s]
 62%|██████▏   | 62/100 [00:05<00:04,  8.06it/s]
 63%|██████▎   | 63/100 [00:05<00:04,  8.15it/s]
 64%|██████▍   | 64/100 [00:05<00:04,  7.67it/s]
 65%|██████▌   | 65/100 [00:05<00:04,  7.46it/s]
 66%|██████▌   | 66/100 [00:06<00:04,  7.35it/s]
 67%|██████▋   | 67/100 [00:06<00:04,  7.43it/s]
 68%|██████▊   | 68/100 [00:06<00:04,  7.85it/s]
 69%|██████▉   | 69/100 [00:06<00:04,  7.40it/s]
 70%|███████   | 70/100 [00:06<00:04,  7.25it/s]
 71%|███████   | 71/100 [00:06<00:03,  7.25it/s]
 73%|███████▎  | 73/100 [00:06<00:03,  7.86it/s]
 74%|███████▍  | 74/100 [00:07<00:03,  7.93it/s]
 75%|███████▌  | 75/100 [00:07<00:03,  7.81it/s]
 76%|███████▌  | 76/100 [00:07<00:03,  7.75it/s]
 77%|███████▋  | 77/100 [00:07<00:02,  8.23it/s]
 78%|███████▊  | 78/100 [00:07<00:02,  7.87it/s]
 79%|███████▉  | 79/100 [00:07<00:02,  7.44it/s]
 80%|████████  | 80/100 [00:07<00:02,  7.99it/s]
 81%|████████  | 81/100 [00:08<00:02,  7.79it/s]
 82%|████████▏ | 82/100 [00:08<00:02,  7.50it/s]
 83%|████████▎ | 83/100 [00:08<00:02,  7.32it/s]
 85%|████████▌ | 85/100 [00:08<00:01,  9.49it/s]
 86%|████████▌ | 86/100 [00:08<00:01,  9.40it/s]
 88%|████████▊ | 88/100 [00:08<00:01, 11.04it/s]
 90%|█████████ | 90/100 [00:08<00:00, 11.24it/s]
 92%|█████████▏| 92/100 [00:08<00:00, 12.12it/s]
 94%|█████████▍| 94/100 [00:09<00:00, 12.70it/s]
 96%|█████████▌| 96/100 [00:09<00:00, 13.18it/s]
 98%|█████████▊| 98/100 [00:09<00:00, 13.37it/s]
100%|██████████| 100/100 [00:09<00:00, 10.42it/s]
 90%|█████████ | 9/10 [01:18<00:08,  8.80s/it]
Processing folder: 10
  0%|          | 0/100 [00:00<?, ?it/s]
  2%|▏         | 2/100 [00:00<00:06, 14.76it/s]
  4%|▍         | 4/100 [00:00<00:07, 12.71it/s]
  6%|▌         | 6/100 [00:00<00:07, 13.35it/s]
  8%|▊         | 8/100 [00:00<00:06, 13.71it/s]
 10%|█         | 10/100 [00:00<00:06, 13.84it/s]
 12%|█▏        | 12/100 [00:00<00:06, 13.43it/s]
 14%|█▍        | 14/100 [00:01<00:06, 13.50it/s]
 16%|█▌        | 16/100 [00:01<00:06, 13.47it/s]
 18%|█▊        | 18/100 [00:01<00:06, 12.72it/s]
 20%|██        | 20/100 [00:01<00:06, 13.09it/s]
 22%|██▏       | 22/100 [00:01<00:05, 13.50it/s]
 24%|██▍       | 24/100 [00:01<00:05, 13.78it/s]
 26%|██▌       | 26/100 [00:01<00:05, 13.60it/s]
 28%|██▊       | 28/100 [00:02<00:05, 13.70it/s]
 30%|███       | 30/100 [00:02<00:05, 13.81it/s]
 32%|███▏      | 32/100 [00:02<00:05, 13.59it/s]
 34%|███▍      | 34/100 [00:02<00:04, 13.96it/s]
 36%|███▌      | 36/100 [00:02<00:04, 14.28it/s]
 38%|███▊      | 38/100 [00:02<00:04, 14.31it/s]
 40%|████      | 40/100 [00:02<00:04, 13.87it/s]
 42%|████▏     | 42/100 [00:03<00:04, 13.81it/s]
 44%|████▍     | 44/100 [00:03<00:04, 13.97it/s]
 46%|████▌     | 46/100 [00:03<00:04, 13.35it/s]
 48%|████▊     | 48/100 [00:03<00:03, 13.68it/s]
 50%|█████     | 50/100 [00:03<00:03, 13.83it/s]
 52%|█████▏    | 52/100 [00:03<00:03, 13.97it/s]
 54%|█████▍    | 54/100 [00:03<00:03, 13.85it/s]
 56%|█████▌    | 56/100 [00:04<00:03, 13.85it/s]
 58%|█████▊    | 58/100 [00:04<00:03, 13.67it/s]
 60%|██████    | 60/100 [00:04<00:02, 13.39it/s]
 62%|██████▏   | 62/100 [00:04<00:02, 13.44it/s]
 64%|██████▍   | 64/100 [00:04<00:02, 13.54it/s]
 66%|██████▌   | 66/100 [00:04<00:02, 13.57it/s]
 68%|██████▊   | 68/100 [00:04<00:02, 13.46it/s]
 70%|███████   | 70/100 [00:05<00:02, 13.86it/s]
 72%|███████▏  | 72/100 [00:05<00:01, 14.12it/s]
 74%|███████▍  | 74/100 [00:05<00:01, 14.21it/s]
 76%|███████▌  | 76/100 [00:05<00:01, 13.46it/s]
 78%|███████▊  | 78/100 [00:05<00:01, 13.84it/s]
 80%|████████  | 80/100 [00:05<00:01, 13.99it/s]
 82%|████████▏ | 82/100 [00:05<00:01, 13.99it/s]
 84%|████████▍ | 84/100 [00:06<00:01, 13.96it/s]
 86%|████████▌ | 86/100 [00:06<00:01, 13.88it/s]
 88%|████████▊ | 88/100 [00:06<00:00, 13.95it/s]
 90%|█████████ | 90/100 [00:06<00:00, 13.54it/s]
 92%|█████████▏| 92/100 [00:06<00:00, 14.01it/s]
 94%|█████████▍| 94/100 [00:06<00:00, 14.26it/s]
 96%|█████████▌| 96/100 [00:06<00:00, 14.37it/s]
 98%|█████████▊| 98/100 [00:07<00:00, 14.20it/s]
100%|██████████| 100/100 [00:07<00:00, 13.76it/s]
100%|██████████| 10/10 [01:26<00:00,  8.63s/it]
                                            features  class
0  [-288.7327, 105.90115, 18.776207, 23.682646, 5...      1
1  [-107.203255, 88.49289, -4.1719, 55.477848, -8...      1
2  [-159.5804, 69.806015, -4.402107, 76.845116, 3...      1
3  [-95.44059, 105.23433, -26.953482, 60.816486, ...      1
4  [-350.35263, 169.53174, 31.771353, 16.71844, 2...      1
class
1     100
2     100
3     100
4     100
5     100
7     100
8     100
9     100
10    100
6      99
Name: count, dtype: int64

Create the dataset using the defined function

View first 5 rows of the data

In [44]:
# View the head of the DataFrame
dataset.head()
Out[44]:
features class
0 [-288.7327, 105.90115, 18.776207, 23.682646, 5... 1
1 [-107.203255, 88.49289, -4.1719, 55.477848, -8... 1
2 [-159.5804, 69.806015, -4.402107, 76.845116, 3... 1
3 [-95.44059, 105.23433, -26.953482, 60.816486, ... 1
4 [-350.35263, 169.53174, 31.771353, 16.71844, 2... 1
In [45]:
dataset.shape
Out[45]:
(999, 2)
In [46]:
dataset.dtypes
Out[46]:
0
features object
class int64

In [47]:
# Storing the class as int
dataset['class'] = [int(x) for x in dataset['class']] # convert from object to integer
In [48]:
# Check the frequency of classes in the dataset
dataset['class'].value_counts()
Out[48]:
count
class
1 100
2 100
3 100
4 100
5 100
7 100
8 100
9 100
10 100
6 99

Visualizing the Mel Frequency Cepstral Coefficients Using a Spectrogram¶

  • draw_spectrograms : From the Mel Coefficients we are extracting for a particular audio, this function is creating the 2-D graph of those coefficients with the X-axis representing time and the Y-axis shows the corresponding Mel coefficients in that time step.
In [49]:
# Function to extract and visualize MFCCs
def draw_spectrograms(audio_data, sample_rate):

    # Extract MFCC features
    extracted_features = librosa.feature.mfcc(y=audio_data, sr=sample_rate, n_mfcc=40)

    # Return features without scaling
    return extracted_features

Plot the MFCCs, but it's difficult to tell what kind of signal is hiding behind such representation.

In [50]:
# Creating subplots
fig, ax = plt.subplots(5, 2, figsize = (15, 30))

# Initializing row and column variables for subplots
row = 0
column = 0

for digit in range(10):

    # Get the audio of different classes (0-9)
    audio_data, sample_rate = get_audio_raw(digit)

    # Extract their MFCC
    mfcc = draw_spectrograms(audio_data, sample_rate)
    print(f"Shape of MFCC of audio digit {digit} ---> ", mfcc.shape)

    # Display the plots and its title
    ax[row,column].set_title(f"MFCC of audio class {digit} across time")
    librosa.display.specshow(mfcc, sr = 22050, ax = ax[row, column])
      # specshow

    # Set X-labels and Y-labels
    ax[row,column].set_xlabel("Time")
    ax[row,column].set_ylabel("MFCC Coefficients")

    # Conditions for positioning of the plots
    if column == 1:
        column = 0
        row += 1
    else:
        column+=1

plt.tight_layout(pad = 3)
plt.show()
Shape of MFCC of audio digit 0 --->  (40, 1298)
Shape of MFCC of audio digit 1 --->  (40, 1293)
Shape of MFCC of audio digit 2 --->  (40, 1293)
Shape of MFCC of audio digit 3 --->  (40, 1320)
Shape of MFCC of audio digit 4 --->  (40, 1301)
Shape of MFCC of audio digit 5 --->  (40, 1293)
Shape of MFCC of audio digit 6 --->  (40, 1293)
Shape of MFCC of audio digit 7 --->  (40, 1293)
Shape of MFCC of audio digit 8 --->  (40, 1293)
Shape of MFCC of audio digit 9 --->  (40, 1293)
No description has been provided for this image
  • Each image represents MFCC spectrograms for different audio classes (digits 0-9).

  • MFCCs (Mel Frequency Cepstral Coefficients) capture spectral properties of sound, useful for distinguishing different audio patterns.

  • X-axis represents time (progression of audio signal).

  • Y-axis represents MFCC coefficients (features extracted from the sound frequencies).

  • The MFCC spectrograms you generated show how the spectral energy (frequency content) of audio changes over time. Here's how to interpret the colors:

    • Red & Orange Areas → Higher energy (louder sounds)
    • Blue Areas → Lower energy (quieter sounds)
    • Horizontal Bands → Harmonic content (musical tones)
    • Vertical Variations → Changes in rhythm, beat, and sound texture
  • Folders "01" to "10" represent different genres (each class is a genres)

    • MFCC spectrograms you generated represent the spectral features of each genre.
  • Each genre has unique spectral characteristics:

    • Classical/Jazz: More smooth and continuous patterns.
    • Rock/Metal: More irregular and dynamic shifts in intensity.
    • Electronic/EDM: Clear periodic patterns from synthesized beats.
    • Hip-Hop/Rap: Strong low-frequency presence (bass-heavy).

Improve data visualization of the image above:

  • Change the subplot size.
  • Use cmap input option in librosa.display.specshow
In [51]:
# Creating subplots
fig, ax = plt.subplots(2, 5, figsize = (15, 7))

# Initializing row and column variables for subplots
row = 0
column = 0

for digit in range(10):

    # Get the audio of different classes (0-9)
    audio_data, sample_rate = get_audio_raw(digit)

    # Extract their MFCC
    mfcc = draw_spectrograms(audio_data, sample_rate)
    print(f"Shape of MFCC of audio digit {digit} ---> ", mfcc.shape)

    # Display the plots and its title
    ax[row,column].set_title(f"Class {digit}")
    librosa.display.specshow(mfcc, sr = 22050, ax = ax[row, column], cmap='tab20') # cmap='tab20') difference from other cell

    # Set X-labels and Y-labels
    ax[row,column].set_xlabel("Time")
    ax[row,column].set_ylabel("MFCC Coefficients")

    # Conditions for positioning of the plots
    if row == 1:
        row = 0
        column += 1
    else:
        row+=1

fig.suptitle('MFCC of different audio class')
plt.tight_layout(pad=1)
plt.show()
Shape of MFCC of audio digit 0 --->  (40, 1293)
Shape of MFCC of audio digit 1 --->  (40, 1293)
Shape of MFCC of audio digit 2 --->  (40, 1293)
Shape of MFCC of audio digit 3 --->  (40, 1293)
Shape of MFCC of audio digit 4 --->  (40, 1293)
Shape of MFCC of audio digit 5 --->  (40, 1303)
Shape of MFCC of audio digit 6 --->  (40, 1293)
Shape of MFCC of audio digit 7 --->  (40, 1293)
Shape of MFCC of audio digit 8 --->  (40, 1293)
Shape of MFCC of audio digit 9 --->  (40, 1293)
No description has been provided for this image
  • Observations on the MFCC Spectrograms Across Music Genres

    • Each plot represents the Mel Frequency Cepstral Coefficients (MFCCs) for different music genres. MFCCs capture the frequency characteristics of the audio signal, which helps in identifying unique patterns across genres.
  • Color Interpretation

    • The color variations in the spectrograms represent different frequency intensities over time.
    • The "tab20" colormap (used in the second plot) assigns distinct colors rather than a gradient-based heatmap, which categorizes different frequency components instead of showing intensity levels.
    • Some spectrograms appear denser and more uniform, while others are more sparse and structured, indicating differences in harmonic and rhythmic complexity.
  • Genre-Specific Observations

    • 01 blues
      • Dark brown, dense spectrogram with scattered frequency variations.
      • Blues music typically has strong mid-range frequencies with steady rhythms.
      • Expect repeating patterns due to the classic 12-bar blues structure.
    • 02 classical
      • Lighter, structured frequency content.
      • Classical music features rich harmonics and smooth variations.
      • Sparse high-frequency content due to dominant string and orchestral instruments.
    • 03 - Country
      • Grayish pattern with noticeable separations.
      • Country music typically has clear vocals and acoustic instruments.
      • Expect steady mid-range energy with occasional high-frequency bursts (e.g., from string plucks).
    • 04 - Disco
      • Dense, uniform color patterns indicating strong rhythmic beats.
      • Disco is bass-heavy with consistent mid and high-range frequencies.
      • Expect periodic peaks due to the dance beat structure.
    • 05 - Hip-hop
      • Sparse frequency distribution with dominant low and mid-range energy
      • Hip-hop often has bass-heavy beats, with sharp peaks for percussive elements (kick & snare drums).
      • Less harmonic complexity compared to classical or jazz.
    • 06 - Jazz
      • Balanced distribution with noticeable high-frequency components.
      • Jazz has complex harmonic structures with frequent chord changes.
      • Expect instrumental variation (saxophones, trumpets, pianos) contributing to rich harmonics.
    • 07 - Metal
      • Highly intense, dense spectrogram.
      • Metal is distorted guitar-heavy with aggressive high-frequency components.
      • Expect strong mid-high frequency dominance due to power chords & cymbals.
    • 08 - Pop
      • High-energy spectrogram with widespread frequency coverage.
      • Pop songs typically have clear vocals and electronic beats.
      • Expect consistency in spectral features due to polished production.
    • 09 - Reggae
      • More sparse compared to pop & metal, with distinct rhythm-based separations.
      • Reggae has a relaxed beat structure, often with emphasis on offbeat rhythms.
      • Mid-range dominance with occasional sharp high-frequency peaks.
    • 10 - Rock
      • Fairly dense spectrogram, not as intense as metal but still featuring strong mid-range energy.
      • Rock music has consistent drum beats, electric guitars, and vocals.
      • Expect noticeable frequency variation based on instrumentation.
  • Summary

    • Classical and Jazz have smoother, structured harmonic-rich spectrograms.
    • Blues, Rock, and Country share some mid-range similarities but differ in rhythmic structure.
    • Metal and Hip-hop show intense frequency dominance in different areas.
    • Pop and Disco exhibit structured, periodic patterns due to strong rhythmic consistency.
    • Reggae has more gap-separated rhythmic structures, making it unique.

Perform Train-Test-Split¶

Split the data into train and test sets

In [52]:
# # Import train_test_split function
# from sklearn.model_selection import train_test_split

# X = np.array(dataset['features'].to_list())
# Y = np.array(dataset['class'].to_list()) # Target

# # Create train set and test set
# X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size = 0.75, shuffle = True, random_state = 8)
In [53]:
# Import train_test_split function
from sklearn.model_selection import train_test_split

X = np.array(dataset['features'].to_list())
Y = np.array(dataset['class'].to_list()) - 1  # Fix: Convert from 1-10 to 0-9

# Create train set and test set
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.75, shuffle=True, random_state=8)
In [54]:
# Checking the shape of the data
X_train.shape
Out[54]:
(749, 40)
In [55]:
X_train
Out[55]:
array([[-5.7759789e+01,  9.5766479e+01, -1.0109319e+01, ...,
        -3.7151713e+00, -4.5247555e+00, -3.6879799e+00],
       [-9.4268730e+01,  9.1890869e+01, -1.7601385e+01, ...,
         2.3584796e-01, -1.9522644e+00, -5.6698909e+00],
       [-3.0045297e+02,  9.9252052e+01,  5.3022732e+01, ...,
        -1.8707280e+00, -8.2629883e-01, -1.4659938e+00],
       ...,
       [-2.2630965e+02,  7.8283951e+01,  7.8239198e+00, ...,
        -1.5126579e+00, -3.6971474e-01, -3.4288461e+00],
       [-1.0856484e+02,  6.9971283e+01,  1.4885138e+01, ...,
        -2.9707897e+00, -1.7937195e+00, -2.3009326e+00],
       [-1.0951281e+02,  9.7391228e+01, -2.0617918e+01, ...,
        -1.3636816e+00,  1.3630954e+00, -1.4500473e+00]], dtype=float32)
In [56]:
Y_train
Out[56]:
array([2, 9, 8, 7, 5, 6, 4, 9, 6, 8, 0, 3, 3, 7, 0, 4, 1, 0, 2, 7, 5, 9,
       6, 6, 8, 5, 5, 2, 7, 5, 7, 3, 8, 3, 0, 8, 3, 1, 0, 7, 9, 2, 4, 1,
       3, 4, 8, 2, 9, 4, 1, 9, 6, 9, 2, 7, 4, 8, 8, 9, 8, 1, 2, 3, 9, 1,
       5, 4, 5, 8, 5, 5, 5, 7, 4, 3, 9, 7, 9, 0, 4, 2, 6, 3, 2, 9, 6, 6,
       9, 0, 1, 5, 7, 4, 6, 5, 5, 8, 7, 2, 0, 0, 3, 0, 8, 9, 4, 1, 7, 0,
       3, 1, 6, 9, 6, 5, 8, 5, 0, 4, 4, 7, 9, 5, 8, 3, 0, 4, 1, 8, 1, 2,
       6, 1, 6, 3, 1, 1, 1, 5, 2, 4, 4, 3, 5, 0, 6, 7, 6, 5, 2, 3, 6, 6,
       0, 2, 2, 0, 0, 9, 5, 8, 4, 9, 9, 8, 7, 5, 3, 3, 0, 7, 1, 5, 5, 2,
       3, 2, 6, 2, 2, 5, 3, 8, 7, 3, 1, 3, 0, 0, 2, 8, 7, 9, 7, 5, 2, 0,
       4, 8, 9, 1, 0, 6, 7, 4, 7, 2, 1, 2, 7, 2, 8, 7, 0, 6, 9, 5, 1, 5,
       2, 6, 3, 9, 0, 3, 9, 6, 7, 7, 5, 1, 0, 9, 4, 6, 9, 7, 3, 1, 1, 8,
       7, 2, 6, 5, 3, 6, 6, 6, 6, 8, 3, 3, 6, 7, 1, 6, 6, 8, 5, 6, 0, 6,
       9, 7, 1, 9, 3, 2, 3, 5, 7, 5, 8, 8, 5, 7, 1, 1, 4, 8, 8, 8, 7, 7,
       0, 9, 2, 7, 0, 5, 2, 9, 9, 7, 2, 9, 4, 0, 8, 3, 9, 8, 0, 0, 0, 5,
       9, 0, 9, 9, 3, 9, 1, 6, 9, 7, 5, 0, 0, 8, 3, 4, 1, 4, 9, 9, 9, 4,
       2, 9, 2, 4, 7, 0, 2, 3, 0, 0, 0, 2, 2, 1, 3, 5, 2, 0, 8, 7, 4, 5,
       7, 0, 6, 5, 2, 7, 5, 8, 5, 2, 9, 8, 5, 4, 3, 0, 7, 3, 9, 2, 7, 2,
       7, 9, 9, 8, 0, 5, 5, 5, 1, 9, 7, 1, 4, 9, 2, 4, 8, 3, 2, 5, 3, 8,
       9, 8, 6, 1, 7, 1, 0, 6, 2, 1, 2, 8, 9, 5, 4, 4, 2, 1, 7, 9, 0, 5,
       7, 4, 1, 9, 8, 6, 5, 5, 4, 9, 7, 3, 3, 8, 9, 8, 6, 5, 1, 6, 8, 6,
       8, 7, 6, 1, 3, 1, 5, 7, 0, 6, 4, 0, 8, 3, 2, 0, 3, 7, 0, 1, 5, 6,
       6, 1, 2, 7, 2, 7, 2, 4, 7, 4, 0, 2, 0, 0, 3, 0, 1, 2, 3, 7, 9, 6,
       8, 6, 0, 4, 1, 2, 9, 3, 3, 0, 6, 2, 9, 8, 2, 1, 3, 0, 3, 0, 6, 6,
       9, 7, 3, 2, 6, 4, 1, 0, 6, 6, 2, 4, 2, 8, 5, 8, 1, 3, 4, 5, 5, 5,
       4, 4, 6, 2, 4, 7, 4, 8, 9, 2, 4, 3, 1, 4, 4, 8, 8, 2, 3, 6, 3, 8,
       8, 1, 9, 1, 7, 1, 2, 4, 4, 8, 7, 3, 2, 4, 4, 3, 3, 2, 4, 8, 6, 6,
       0, 0, 9, 9, 9, 1, 9, 5, 6, 5, 2, 0, 2, 4, 1, 8, 2, 9, 1, 7, 5, 1,
       7, 7, 6, 4, 5, 3, 9, 7, 1, 8, 4, 7, 9, 5, 7, 4, 4, 1, 5, 1, 1, 6,
       9, 5, 5, 3, 3, 9, 8, 5, 1, 6, 4, 4, 2, 1, 2, 0, 4, 6, 8, 6, 4, 7,
       0, 6, 2, 2, 0, 2, 7, 1, 6, 7, 7, 1, 2, 1, 5, 3, 9, 0, 4, 7, 3, 3,
       0, 8, 1, 3, 6, 9, 0, 6, 4, 3, 5, 4, 4, 3, 2, 6, 6, 2, 3, 1, 5, 1,
       4, 0, 9, 8, 3, 5, 7, 0, 7, 5, 6, 0, 0, 5, 5, 4, 3, 4, 7, 8, 0, 4,
       3, 7, 0, 8, 3, 4, 8, 1, 8, 8, 8, 9, 9, 5, 7, 4, 9, 1, 2, 5, 3, 3,
       3, 1, 0, 9, 2, 6, 6, 4, 1, 9, 8, 0, 8, 0, 7, 3, 8, 1, 9, 1, 3, 3,
       4])

Artificial Neural Networks (ANNs)¶

Modelling¶

Create an artificial neural network to recognize the digit.

About the libraries:

  • Keras: Keras is an open-source deep-learning library in Python. Keras is popular because the API was clean and simple, allowing standard deep learning models to be defined, fit, and evaluated in just a few lines of code.
  • Sklearn :
    • Simple and efficient tools for predictive data analysis
    • Accessible to everybody, and reusable in various contexts
    • Built on NumPy, SciPy, and matplotlib
    • Open source, commercially usable

Import necessary libraries for building the model

In [57]:
# To create an ANN model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# To create a checkpoint and save the best model
# from tensorflow.keras.callbacks import ModelCheckpoint

# To load the model
from tensorflow.keras.models import load_model

# Input
from tensorflow.keras import Input

# To evaluate the model
from sklearn.metrics import classification_report, confusion_matrix
# from sklearn.preprocessing import LabelBinarizer

Model Creation¶

  • When we are converting audios to their corresponding spectrograms, we will have similar spectrograms for similar audios irrespective of who the speaker is, and what is their pitch and timber like. So local spatiality is never going to be a problem. So having convolutional layers on top of our fully connected layers is just adding to our computational redundancy.

  • We will use a Sequential model with multiple connected hidden layers, and an output layer that returns a single, continuous value.

    • A Sequential model is a linear stack of layers. Sequential models can be created by giving a list of layer instances.
    • A dense layer of neurons is a simple layer of neurons in which each neuron receives input from all of the neurons in the previous layer.
    • The most popular function employed for hidden layers is the rectified linear activation function, or ReLU activation function. It's popular because it's easy to use and effective in getting around the limitations of other popular activation functions like Sigmoid and Tanh.

he input shape specifies the shape of the input data. It is important to ensure that your model can process the data correctly.

In [58]:
# Crete a Sequential Object
model1 = Sequential()

# Set input shape
model1.add(Input(shape=(40, )))

#---------------------------------------------------------------------------------Hidden Layer --------------------------------------
# Add first layer with 100 neurons to the sequental object
model1.add(Dense(100, activation = 'relu'))
  # use equal to or greater than 40

# Add second layer with 100 neurons to the sequental object
model1.add(Dense(100, activation = 'relu'))

# Add third layer with 100 neurons to the sequental object
model1.add(Dense(100, activation = 'relu'))

# ----------------------------------------------------------------------------Output Layer ------------------------------------
# Output layer with 10 neurons as it has 10 classes
model1.add(Dense(10, activation = 'softmax')) # multi-classification
In [59]:
# Print Summary of the model
model1.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense (Dense)                        │ (None, 100)                 │           4,100 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 100)                 │          10,100 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 100)                 │          10,100 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_3 (Dense)                      │ (None, 10)                  │           1,010 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 25,310 (98.87 KB)
 Trainable params: 25,310 (98.87 KB)
 Non-trainable params: 0 (0.00 B)
  • CategoricalCrossentropy:
    • The labels must be provided in a one_hot representation.
    • if you one-hot encode your labels before feeding them into the model.
  • SparseCategoricalCrossentropy:
    • The labels must be provided as integers.
    • labels are integers
In [60]:
# Compile the model
model1.compile(loss = 'sparse_categorical_crossentropy',
              metrics = ['accuracy'],
              optimizer = 'adam')

Model Checkpoint & Training¶

In [61]:
%%time
# Set the number of epochs for training
num_epochs = 100

# Set the batch size for training
batch_size = 32

# Fit the model
model1.fit(X_train,
          Y_train,
          validation_data = (X_test, Y_test),
          epochs = num_epochs,
          batch_size = batch_size,
          verbose = 1)
Epoch 1/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 2s 16ms/step - accuracy: 0.1721 - loss: 7.4053 - val_accuracy: 0.2960 - val_loss: 2.3302
Epoch 2/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.3757 - loss: 1.9830 - val_accuracy: 0.3920 - val_loss: 1.8912
Epoch 3/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.4309 - loss: 1.6333 - val_accuracy: 0.3840 - val_loss: 1.7328
Epoch 4/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.4708 - loss: 1.4402 - val_accuracy: 0.5000 - val_loss: 1.6047
Epoch 5/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.5663 - loss: 1.2619 - val_accuracy: 0.4960 - val_loss: 1.4947
Epoch 6/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.6301 - loss: 1.1299 - val_accuracy: 0.4800 - val_loss: 1.5686
Epoch 7/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.6096 - loss: 1.1090 - val_accuracy: 0.4120 - val_loss: 1.6659
Epoch 8/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.5946 - loss: 1.1142 - val_accuracy: 0.4760 - val_loss: 1.6022
Epoch 9/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6413 - loss: 0.9965 - val_accuracy: 0.5200 - val_loss: 1.4391
Epoch 10/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.7032 - loss: 0.8600 - val_accuracy: 0.4560 - val_loss: 1.6319
Epoch 11/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.6890 - loss: 0.8508 - val_accuracy: 0.5000 - val_loss: 1.5144
Epoch 12/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.7197 - loss: 0.7505 - val_accuracy: 0.5120 - val_loss: 1.4340
Epoch 13/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.7658 - loss: 0.7366 - val_accuracy: 0.5360 - val_loss: 1.4602
Epoch 14/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.7646 - loss: 0.6575 - val_accuracy: 0.5160 - val_loss: 1.5311
Epoch 15/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7480 - loss: 0.7283 - val_accuracy: 0.5320 - val_loss: 1.5356
Epoch 16/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8244 - loss: 0.5501 - val_accuracy: 0.5360 - val_loss: 1.5338
Epoch 17/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8065 - loss: 0.5486 - val_accuracy: 0.5320 - val_loss: 1.5437
Epoch 18/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8285 - loss: 0.5388 - val_accuracy: 0.5440 - val_loss: 1.5237
Epoch 19/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 10ms/step - accuracy: 0.8305 - loss: 0.5253 - val_accuracy: 0.5160 - val_loss: 1.6184
Epoch 20/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7829 - loss: 0.5784 - val_accuracy: 0.5280 - val_loss: 1.6348
Epoch 21/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.8299 - loss: 0.5161 - val_accuracy: 0.5040 - val_loss: 1.7883
Epoch 22/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.7969 - loss: 0.5071 - val_accuracy: 0.5240 - val_loss: 1.6919
Epoch 23/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8248 - loss: 0.4730 - val_accuracy: 0.5360 - val_loss: 1.6463
Epoch 24/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8842 - loss: 0.3677 - val_accuracy: 0.5320 - val_loss: 1.6900
Epoch 25/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9007 - loss: 0.3588 - val_accuracy: 0.5640 - val_loss: 1.6553
Epoch 26/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8961 - loss: 0.3422 - val_accuracy: 0.5560 - val_loss: 1.6764
Epoch 27/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8942 - loss: 0.3223 - val_accuracy: 0.5280 - val_loss: 1.7465
Epoch 28/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8893 - loss: 0.3733 - val_accuracy: 0.5320 - val_loss: 1.7637
Epoch 29/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9029 - loss: 0.2968 - val_accuracy: 0.5160 - val_loss: 1.7675
Epoch 30/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9392 - loss: 0.2535 - val_accuracy: 0.5560 - val_loss: 1.8146
Epoch 31/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9239 - loss: 0.2727 - val_accuracy: 0.5240 - val_loss: 1.7801
Epoch 32/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9418 - loss: 0.2583 - val_accuracy: 0.5600 - val_loss: 1.7930
Epoch 33/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9562 - loss: 0.1966 - val_accuracy: 0.5480 - val_loss: 1.8136
Epoch 34/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - accuracy: 0.9549 - loss: 0.1962 - val_accuracy: 0.5360 - val_loss: 1.9375
Epoch 35/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9250 - loss: 0.2676 - val_accuracy: 0.5240 - val_loss: 1.9816
Epoch 36/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9317 - loss: 0.2384 - val_accuracy: 0.5520 - val_loss: 2.0031
Epoch 37/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9644 - loss: 0.1447 - val_accuracy: 0.5000 - val_loss: 2.1297
Epoch 38/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9639 - loss: 0.1728 - val_accuracy: 0.5600 - val_loss: 1.9049
Epoch 39/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9747 - loss: 0.1422 - val_accuracy: 0.5280 - val_loss: 1.9816
Epoch 40/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9832 - loss: 0.1293 - val_accuracy: 0.5480 - val_loss: 2.0905
Epoch 41/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9685 - loss: 0.1435 - val_accuracy: 0.5560 - val_loss: 2.0139
Epoch 42/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9752 - loss: 0.1304 - val_accuracy: 0.5400 - val_loss: 2.0129
Epoch 43/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9786 - loss: 0.1223 - val_accuracy: 0.5400 - val_loss: 2.0416
Epoch 44/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9773 - loss: 0.1224 - val_accuracy: 0.5240 - val_loss: 2.0511
Epoch 45/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9820 - loss: 0.1078 - val_accuracy: 0.5120 - val_loss: 2.1525
Epoch 46/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9852 - loss: 0.1061 - val_accuracy: 0.5160 - val_loss: 2.1780
Epoch 47/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9792 - loss: 0.1071 - val_accuracy: 0.5480 - val_loss: 2.1577
Epoch 48/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9851 - loss: 0.0919 - val_accuracy: 0.5440 - val_loss: 2.1966
Epoch 49/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9976 - loss: 0.0612 - val_accuracy: 0.5480 - val_loss: 2.2887
Epoch 50/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9837 - loss: 0.0945 - val_accuracy: 0.5040 - val_loss: 2.3680
Epoch 51/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9740 - loss: 0.1217 - val_accuracy: 0.5360 - val_loss: 2.3032
Epoch 52/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9852 - loss: 0.0860 - val_accuracy: 0.5320 - val_loss: 2.2783
Epoch 53/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9926 - loss: 0.0639 - val_accuracy: 0.5400 - val_loss: 2.2410
Epoch 54/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9918 - loss: 0.0550 - val_accuracy: 0.5320 - val_loss: 2.2745
Epoch 55/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9946 - loss: 0.0524 - val_accuracy: 0.5400 - val_loss: 2.3175
Epoch 56/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9961 - loss: 0.0414 - val_accuracy: 0.5400 - val_loss: 2.3332
Epoch 57/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9958 - loss: 0.0412 - val_accuracy: 0.5440 - val_loss: 2.3241
Epoch 58/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9989 - loss: 0.0356 - val_accuracy: 0.5320 - val_loss: 2.3560
Epoch 59/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9978 - loss: 0.0358 - val_accuracy: 0.5360 - val_loss: 2.4160
Epoch 60/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9970 - loss: 0.0377 - val_accuracy: 0.5400 - val_loss: 2.3731
Epoch 61/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9982 - loss: 0.0337 - val_accuracy: 0.5320 - val_loss: 2.4049
Epoch 62/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9967 - loss: 0.0301 - val_accuracy: 0.5200 - val_loss: 2.4363
Epoch 63/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9963 - loss: 0.0294 - val_accuracy: 0.5240 - val_loss: 2.4332
Epoch 64/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9987 - loss: 0.0244 - val_accuracy: 0.5360 - val_loss: 2.4721
Epoch 65/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9966 - loss: 0.0384 - val_accuracy: 0.5360 - val_loss: 2.4973
Epoch 66/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9977 - loss: 0.0226 - val_accuracy: 0.5600 - val_loss: 2.5251
Epoch 67/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9986 - loss: 0.0261 - val_accuracy: 0.5320 - val_loss: 2.5316
Epoch 68/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9966 - loss: 0.0312 - val_accuracy: 0.5320 - val_loss: 2.5184
Epoch 69/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9904 - loss: 0.0449 - val_accuracy: 0.5640 - val_loss: 2.5873
Epoch 70/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9885 - loss: 0.0553 - val_accuracy: 0.5240 - val_loss: 2.5717
Epoch 71/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9986 - loss: 0.0251 - val_accuracy: 0.5480 - val_loss: 2.6514
Epoch 72/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9922 - loss: 0.0479 - val_accuracy: 0.5320 - val_loss: 2.7161
Epoch 73/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.9922 - loss: 0.0362 - val_accuracy: 0.5240 - val_loss: 2.6747
Epoch 74/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9955 - loss: 0.0376 - val_accuracy: 0.5240 - val_loss: 2.7733
Epoch 75/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9404 - loss: 0.2041 - val_accuracy: 0.5440 - val_loss: 2.7325
Epoch 76/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9493 - loss: 0.2134 - val_accuracy: 0.5080 - val_loss: 2.7039
Epoch 77/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9624 - loss: 0.1325 - val_accuracy: 0.5360 - val_loss: 2.5794
Epoch 78/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9691 - loss: 0.0975 - val_accuracy: 0.5440 - val_loss: 2.7746
Epoch 79/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9751 - loss: 0.0712 - val_accuracy: 0.5400 - val_loss: 2.7646
Epoch 80/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9895 - loss: 0.0567 - val_accuracy: 0.5320 - val_loss: 2.5701
Epoch 81/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.9999 - loss: 0.0226 - val_accuracy: 0.5280 - val_loss: 2.6910
Epoch 82/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.9965 - loss: 0.0295 - val_accuracy: 0.5440 - val_loss: 2.6383
Epoch 83/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9965 - loss: 0.0197 - val_accuracy: 0.5600 - val_loss: 2.7159
Epoch 84/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 9ms/step - accuracy: 0.9988 - loss: 0.0120 - val_accuracy: 0.5440 - val_loss: 2.7271
Epoch 85/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9986 - loss: 0.0124 - val_accuracy: 0.5480 - val_loss: 2.7273
Epoch 86/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.9986 - loss: 0.0102 - val_accuracy: 0.5400 - val_loss: 2.7168
Epoch 87/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9963 - loss: 0.0141 - val_accuracy: 0.5160 - val_loss: 2.7817
Epoch 88/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9978 - loss: 0.0171 - val_accuracy: 0.5520 - val_loss: 2.7674
Epoch 89/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9969 - loss: 0.0126 - val_accuracy: 0.5360 - val_loss: 2.7392
Epoch 90/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9964 - loss: 0.0161 - val_accuracy: 0.5280 - val_loss: 2.7826
Epoch 91/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9949 - loss: 0.0115 - val_accuracy: 0.5520 - val_loss: 2.7978
Epoch 92/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9997 - loss: 0.0078 - val_accuracy: 0.5280 - val_loss: 2.8056
Epoch 93/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9970 - loss: 0.0130 - val_accuracy: 0.5440 - val_loss: 2.8000
Epoch 94/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9993 - loss: 0.0071 - val_accuracy: 0.5360 - val_loss: 2.8448
Epoch 95/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9974 - loss: 0.0182 - val_accuracy: 0.5600 - val_loss: 2.8185
Epoch 96/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9990 - loss: 0.0081 - val_accuracy: 0.5520 - val_loss: 2.8214
Epoch 97/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9982 - loss: 0.0072 - val_accuracy: 0.5440 - val_loss: 2.8363
Epoch 98/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9979 - loss: 0.0085 - val_accuracy: 0.5360 - val_loss: 2.8516
Epoch 99/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9995 - loss: 0.0065 - val_accuracy: 0.5520 - val_loss: 2.8632
Epoch 100/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9994 - loss: 0.0075 - val_accuracy: 0.5440 - val_loss: 2.9018
CPU times: user 22.2 s, sys: 1.13 s, total: 23.3 s
Wall time: 30.2 s
Out[61]:
<keras.src.callbacks.history.History at 0x7cda2949b910>

Model Evaluation¶

In [62]:
# Make predictions on the test set
Y_pred = model1.predict(X_test)

Y_pred = [np.argmax(i) for i in Y_pred]
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step
In [63]:
# Set style as dark
sns.set_style("dark")

# Set figure size
plt.figure(figsize = (15, 8))

# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")

# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)

# Plot the confusion matrix as heatmap. Use BuPu
sns.heatmap(cm, annot = True, cmap = "BuPu", fmt = 'g', cbar = False)

# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")

# Show the plot
plt.show()

# Print the metrics
print(classification_report(Y_test, Y_pred))
No description has been provided for this image
              precision    recall  f1-score   support

           0       0.58      0.58      0.58        24
           1       0.76      0.81      0.79        27
           2       0.42      0.33      0.37        24
           3       0.38      0.39      0.38        23
           4       0.40      0.38      0.39        26
           5       0.60      0.62      0.61        24
           6       0.66      0.89      0.76        28
           7       0.70      0.67      0.68        24
           8       0.48      0.36      0.41        28
           9       0.32      0.32      0.32        22

    accuracy                           0.54       250
   macro avg       0.53      0.54      0.53       250
weighted avg       0.53      0.54      0.54       250

  • Observations
    • Confusion Matrix
      • Best classified genres: Classical (22), Jazz (22), Metal (18), Hip-hop (15).
      • High misclassification: Blues, Country, Disco, Hip-hop, Pop, Reggae.
      • Overlap observed: Genres 4, 5, 8, and 9 frequently misclassified.
    • Classification Report
      • Overall accuracy: 57% (Moderate performance).
      • Best performing genres: Classical (0.79 F1), Metal (0.82 F1), Jazz (0.76 F1).
      • Weak genres: Reggae (0.42 F1), Hip-hop (0.42 F1), Disco (0.42 F1).
      • Macro avg: 0.58, Weighted avg: 0.59 (Moderate class balance).
    • Key Insights
      • Distinctive genres (Classical, Jazz, Metal) perform well.
      • Overlapping genres (Pop, Disco, Hip-hop, Reggae) cause misclassification.
      • Improvements: More training data, feature engineering, CNN/LSTM models, data augmentation.
In [ ]:
# # Create a Sequential Model
# model2 = Sequential([
#     Input(shape=(40,)),  # Input Layer

#     Dense(256, activation='relu'),
#     BatchNormalization(),  # Normalize activations
#     Dropout(0.3),  # Regularization

#     Dense(128, activation='relu'),
#     BatchNormalization(),
#     Dropout(0.3),

#     Dense(64, activation='relu'),

#     Dense(10, activation='softmax')  # Output layer for 10 classes
# ])
In [70]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, Input

# Create a Sequential Model
model2 = Sequential([
    Input(shape=(40,)),  # Input Layer

    Dense(256, activation='relu'),
    BatchNormalization(),  # Normalize activations
    Dropout(0.3),  # Regularization

    Dense(128, activation='relu'),
    BatchNormalization(),
    Dropout(0.3),

    Dense(64, activation='relu'),

    Dense(10, activation='softmax')  # Output layer for 10 classes
])
In [71]:
# Print Summary of the model
model2.summary()
Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense_10 (Dense)                     │ (None, 256)                 │          10,496 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization                  │ (None, 256)                 │           1,024 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout (Dropout)                    │ (None, 256)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_11 (Dense)                     │ (None, 128)                 │          32,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_1                │ (None, 128)                 │             512 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_1 (Dropout)                  │ (None, 128)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_12 (Dense)                     │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_13 (Dense)                     │ (None, 10)                  │             650 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 53,834 (210.29 KB)
 Trainable params: 53,066 (207.29 KB)
 Non-trainable params: 768 (3.00 KB)
In [72]:
model2.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)
In [73]:
num_epochs = 100
batch_size = 32

history = model2.fit(
    X_train, Y_train,
    validation_data=(X_test, Y_test),
    epochs=num_epochs,
    batch_size=batch_size,
    verbose=1
)
Epoch 1/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 3s 25ms/step - accuracy: 0.1614 - loss: 2.5925 - val_accuracy: 0.2240 - val_loss: 4.0188
Epoch 2/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.3664 - loss: 1.8161 - val_accuracy: 0.2960 - val_loss: 3.1013
Epoch 3/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.4089 - loss: 1.6690 - val_accuracy: 0.3520 - val_loss: 2.3665
Epoch 4/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.4676 - loss: 1.4953 - val_accuracy: 0.3920 - val_loss: 1.9178
Epoch 5/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.4881 - loss: 1.5140 - val_accuracy: 0.4360 - val_loss: 1.5959
Epoch 6/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step - accuracy: 0.5668 - loss: 1.3085 - val_accuracy: 0.4680 - val_loss: 1.4722
Epoch 7/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 34ms/step - accuracy: 0.5578 - loss: 1.2672 - val_accuracy: 0.5120 - val_loss: 1.3836
Epoch 8/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 19ms/step - accuracy: 0.5857 - loss: 1.1701 - val_accuracy: 0.5320 - val_loss: 1.3469
Epoch 9/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6076 - loss: 1.1591 - val_accuracy: 0.5440 - val_loss: 1.3474
Epoch 10/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step - accuracy: 0.6334 - loss: 1.0489 - val_accuracy: 0.5600 - val_loss: 1.3353
Epoch 11/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.6221 - loss: 1.0010 - val_accuracy: 0.5840 - val_loss: 1.2491
Epoch 12/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - accuracy: 0.6639 - loss: 0.9914 - val_accuracy: 0.5800 - val_loss: 1.2682
Epoch 13/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.6158 - loss: 0.9932 - val_accuracy: 0.5800 - val_loss: 1.2800
Epoch 14/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7139 - loss: 0.8758 - val_accuracy: 0.5800 - val_loss: 1.2948
Epoch 15/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.6930 - loss: 0.9128 - val_accuracy: 0.5800 - val_loss: 1.2961
Epoch 16/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.6664 - loss: 0.9184 - val_accuracy: 0.5520 - val_loss: 1.3645
Epoch 17/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.6872 - loss: 0.8436 - val_accuracy: 0.5720 - val_loss: 1.3270
Epoch 18/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7086 - loss: 0.8433 - val_accuracy: 0.6000 - val_loss: 1.3041
Epoch 19/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.6948 - loss: 0.8479 - val_accuracy: 0.6000 - val_loss: 1.2850
Epoch 20/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7181 - loss: 0.8038 - val_accuracy: 0.5920 - val_loss: 1.3278
Epoch 21/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.6989 - loss: 0.8127 - val_accuracy: 0.6120 - val_loss: 1.2606
Epoch 22/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7356 - loss: 0.7398 - val_accuracy: 0.6080 - val_loss: 1.2664
Epoch 23/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7410 - loss: 0.7351 - val_accuracy: 0.5800 - val_loss: 1.3309
Epoch 24/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7217 - loss: 0.8048 - val_accuracy: 0.5960 - val_loss: 1.2819
Epoch 25/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7509 - loss: 0.7457 - val_accuracy: 0.5920 - val_loss: 1.3517
Epoch 26/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7436 - loss: 0.6820 - val_accuracy: 0.5960 - val_loss: 1.3294
Epoch 27/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7671 - loss: 0.6754 - val_accuracy: 0.5600 - val_loss: 1.3472
Epoch 28/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7593 - loss: 0.6513 - val_accuracy: 0.6040 - val_loss: 1.3363
Epoch 29/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7648 - loss: 0.6582 - val_accuracy: 0.6280 - val_loss: 1.2918
Epoch 30/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7907 - loss: 0.6515 - val_accuracy: 0.6240 - val_loss: 1.3183
Epoch 31/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7754 - loss: 0.6526 - val_accuracy: 0.6160 - val_loss: 1.3323
Epoch 32/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7865 - loss: 0.5884 - val_accuracy: 0.6080 - val_loss: 1.3120
Epoch 33/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.7692 - loss: 0.6057 - val_accuracy: 0.6120 - val_loss: 1.3141
Epoch 34/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7971 - loss: 0.5570 - val_accuracy: 0.6280 - val_loss: 1.2801
Epoch 35/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7934 - loss: 0.6168 - val_accuracy: 0.6480 - val_loss: 1.2784
Epoch 36/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8012 - loss: 0.5556 - val_accuracy: 0.6240 - val_loss: 1.3097
Epoch 37/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8218 - loss: 0.5205 - val_accuracy: 0.6040 - val_loss: 1.3306
Epoch 38/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7764 - loss: 0.6479 - val_accuracy: 0.6160 - val_loss: 1.3189
Epoch 39/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.7940 - loss: 0.5536 - val_accuracy: 0.6160 - val_loss: 1.3918
Epoch 40/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8308 - loss: 0.4865 - val_accuracy: 0.6240 - val_loss: 1.3394
Epoch 41/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8504 - loss: 0.4926 - val_accuracy: 0.6200 - val_loss: 1.3789
Epoch 42/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8522 - loss: 0.4395 - val_accuracy: 0.6240 - val_loss: 1.3533
Epoch 43/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8349 - loss: 0.4627 - val_accuracy: 0.6320 - val_loss: 1.3493
Epoch 44/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.8405 - loss: 0.4351 - val_accuracy: 0.6200 - val_loss: 1.4002
Epoch 45/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8508 - loss: 0.4468 - val_accuracy: 0.6120 - val_loss: 1.4628
Epoch 46/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8310 - loss: 0.4709 - val_accuracy: 0.6240 - val_loss: 1.4210
Epoch 47/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.8481 - loss: 0.4493 - val_accuracy: 0.5960 - val_loss: 1.3980
Epoch 48/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8584 - loss: 0.4333 - val_accuracy: 0.6160 - val_loss: 1.4414
Epoch 49/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.8293 - loss: 0.4359 - val_accuracy: 0.6040 - val_loss: 1.3537
Epoch 50/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 10ms/step - accuracy: 0.8759 - loss: 0.4010 - val_accuracy: 0.5960 - val_loss: 1.4421
Epoch 51/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8614 - loss: 0.3898 - val_accuracy: 0.6160 - val_loss: 1.4000
Epoch 52/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8672 - loss: 0.3979 - val_accuracy: 0.6160 - val_loss: 1.4364
Epoch 53/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8691 - loss: 0.4203 - val_accuracy: 0.5920 - val_loss: 1.4784
Epoch 54/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8871 - loss: 0.3592 - val_accuracy: 0.6040 - val_loss: 1.4443
Epoch 55/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8495 - loss: 0.4124 - val_accuracy: 0.6160 - val_loss: 1.4901
Epoch 56/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8758 - loss: 0.3626 - val_accuracy: 0.6200 - val_loss: 1.4679
Epoch 57/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8611 - loss: 0.3785 - val_accuracy: 0.6240 - val_loss: 1.4573
Epoch 58/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8655 - loss: 0.3717 - val_accuracy: 0.6360 - val_loss: 1.4217
Epoch 59/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8577 - loss: 0.3781 - val_accuracy: 0.6320 - val_loss: 1.4463
Epoch 60/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8752 - loss: 0.3829 - val_accuracy: 0.6560 - val_loss: 1.4227
Epoch 61/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8634 - loss: 0.3975 - val_accuracy: 0.6440 - val_loss: 1.3981
Epoch 62/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8791 - loss: 0.3751 - val_accuracy: 0.6440 - val_loss: 1.4193
Epoch 63/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8520 - loss: 0.4202 - val_accuracy: 0.6240 - val_loss: 1.3619
Epoch 64/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8877 - loss: 0.3440 - val_accuracy: 0.6600 - val_loss: 1.4788
Epoch 65/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8700 - loss: 0.3814 - val_accuracy: 0.6120 - val_loss: 1.4434
Epoch 66/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8669 - loss: 0.3302 - val_accuracy: 0.6080 - val_loss: 1.5090
Epoch 67/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8739 - loss: 0.3579 - val_accuracy: 0.5960 - val_loss: 1.5128
Epoch 68/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8937 - loss: 0.3272 - val_accuracy: 0.6280 - val_loss: 1.5566
Epoch 69/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8669 - loss: 0.3397 - val_accuracy: 0.6320 - val_loss: 1.4911
Epoch 70/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8727 - loss: 0.3298 - val_accuracy: 0.6320 - val_loss: 1.5200
Epoch 71/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8729 - loss: 0.3065 - val_accuracy: 0.6320 - val_loss: 1.4923
Epoch 72/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8689 - loss: 0.3115 - val_accuracy: 0.6160 - val_loss: 1.5116
Epoch 73/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8708 - loss: 0.3342 - val_accuracy: 0.6120 - val_loss: 1.6131
Epoch 74/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8528 - loss: 0.3673 - val_accuracy: 0.6160 - val_loss: 1.5809
Epoch 75/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8910 - loss: 0.3101 - val_accuracy: 0.6120 - val_loss: 1.5044
Epoch 76/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8775 - loss: 0.3500 - val_accuracy: 0.6200 - val_loss: 1.4944
Epoch 77/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8823 - loss: 0.3157 - val_accuracy: 0.6040 - val_loss: 1.5729
Epoch 78/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.8680 - loss: 0.3624 - val_accuracy: 0.5920 - val_loss: 1.5684
Epoch 79/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 9ms/step - accuracy: 0.9012 - loss: 0.2642 - val_accuracy: 0.6280 - val_loss: 1.5203
Epoch 80/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9168 - loss: 0.2759 - val_accuracy: 0.6400 - val_loss: 1.4986
Epoch 81/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9055 - loss: 0.2722 - val_accuracy: 0.6480 - val_loss: 1.5127
Epoch 82/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8970 - loss: 0.2675 - val_accuracy: 0.6280 - val_loss: 1.5707
Epoch 83/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8880 - loss: 0.3103 - val_accuracy: 0.6080 - val_loss: 1.6644
Epoch 84/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9083 - loss: 0.2493 - val_accuracy: 0.6120 - val_loss: 1.6682
Epoch 85/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9095 - loss: 0.2769 - val_accuracy: 0.6280 - val_loss: 1.6314
Epoch 86/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8885 - loss: 0.3101 - val_accuracy: 0.6440 - val_loss: 1.6029
Epoch 87/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9176 - loss: 0.2489 - val_accuracy: 0.6240 - val_loss: 1.6952
Epoch 88/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9038 - loss: 0.2822 - val_accuracy: 0.5920 - val_loss: 1.7332
Epoch 89/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9088 - loss: 0.2844 - val_accuracy: 0.6280 - val_loss: 1.6102
Epoch 90/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.9135 - loss: 0.2518 - val_accuracy: 0.6320 - val_loss: 1.6249
Epoch 91/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.9037 - loss: 0.2654 - val_accuracy: 0.6000 - val_loss: 1.6024
Epoch 92/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.8816 - loss: 0.2891 - val_accuracy: 0.6080 - val_loss: 1.5582
Epoch 93/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.9085 - loss: 0.2587 - val_accuracy: 0.6400 - val_loss: 1.5490
Epoch 94/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 13ms/step - accuracy: 0.9103 - loss: 0.2508 - val_accuracy: 0.6120 - val_loss: 1.7080
Epoch 95/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9238 - loss: 0.2499 - val_accuracy: 0.6400 - val_loss: 1.6923
Epoch 96/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8938 - loss: 0.2576 - val_accuracy: 0.6360 - val_loss: 1.6316
Epoch 97/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9206 - loss: 0.2327 - val_accuracy: 0.6400 - val_loss: 1.6381
Epoch 98/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9276 - loss: 0.2366 - val_accuracy: 0.6080 - val_loss: 1.6746
Epoch 99/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9175 - loss: 0.2217 - val_accuracy: 0.5880 - val_loss: 1.7912
Epoch 100/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.9107 - loss: 0.2364 - val_accuracy: 0.5960 - val_loss: 1.7483
In [74]:
import matplotlib.pyplot as plt

# Plot training & validation accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

# Plot loss
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
No description has been provided for this image
No description has been provided for this image
  • Observations
    • Overfitting Detected
      • Training accuracy increases steadily and reaches ~90%, while validation accuracy stagnates around 60%.
      • The widening gap suggests the model is memorizing training data but generalizing poorly.
    • Validation Loss Divergence
      • Training loss decreases consistently, indicating learning progress.
      • Validation loss stops decreasing early and starts fluctuating (~epoch 20), confirming overfitting.
    • Possible Fixes
      • Reduce overfitting: Increase dropout, add L2 regularization, or apply data augmentation.
      • Early stopping: Stop training around epoch 20-30 to avoid further overfitting.
      • Try a different architecture: A CNN-based model may perform better on spectral features.
In [75]:
# Make predictions on the test set
Y_pred = model2.predict(X_test)

Y_pred = [np.argmax(i) for i in Y_pred]
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step
In [76]:
# Set style as dark
sns.set_style("dark")

# Set figure size
plt.figure(figsize = (15, 8))

# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")

# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)

# Plot the confusion matrix as heatmap. Use BuPu
sns.heatmap(cm, annot = True, cmap = "BuPu", fmt = 'g', cbar = False)

# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")

# Show the plot
plt.show()

# Print the metrics
print(classification_report(Y_test, Y_pred))
No description has been provided for this image
              precision    recall  f1-score   support

           0       0.71      0.62      0.67        24
           1       0.78      0.78      0.78        27
           2       0.65      0.46      0.54        24
           3       0.38      0.52      0.44        23
           4       0.43      0.62      0.51        26
           5       0.67      0.67      0.67        24
           6       0.79      0.79      0.79        28
           7       0.72      0.75      0.73        24
           8       0.53      0.36      0.43        28
           9       0.40      0.36      0.38        22

    accuracy                           0.60       250
   macro avg       0.60      0.59      0.59       250
weighted avg       0.61      0.60      0.60       250

Model Performance Comparison

Class Precision (Before) Precision (After) Recall (Before) Recall (After) F1-Score (Before) F1-Score (After) Support
0 0.67 0.64 0.50 0.67 0.57 0.65 24
1 0.76 0.76 0.81 0.81 0.79 0.79 27
2 0.50 0.56 0.42 0.58 0.45 0.57 24
3 0.37 0.40 0.48 0.43 0.42 0.42 23
4 0.42 0.54 0.42 0.50 0.42 0.52 26
5 0.68 0.67 0.62 0.67 0.65 0.67 24
6 0.73 0.70 0.79 0.82 0.76 0.75 28
7 0.90 0.73 0.75 0.79 0.82 0.76 24
8 0.44 0.64 0.39 0.50 0.42 0.56 28
9 0.37 0.53 0.50 0.41 0.42 0.46 22

Summary

Metric Before After
Accuracy 0.57 0.62
Macro Avg F1 0.57 0.61
Weighted Avg F1 0.58 0.62

Observations:

  • Overall accuracy improved from 57% to 62%.
  • Most F1-scores increased, showing better genre classification.
  • Genres 0, 2, 3, 4, 8, 9 had notable precision and recall improvements.
  • Genre 7 had a slight drop in precision but maintained good recall.
  • Regularization and normalization likely stabilized the model.
In [77]:
model3 = Sequential([
    Input(shape=(40,)),  # MFCC has 40 features

    Dense(512, activation='relu'),
    BatchNormalization(),
    Dropout(0.3),

    Dense(256, activation='relu'),
    BatchNormalization(),
    Dropout(0.3),

    Dense(128, activation='relu'),
    BatchNormalization(),
    Dropout(0.3),

    Dense(64, activation='relu'),
    BatchNormalization(),
    Dropout(0.3),

    Dense(10, activation='softmax')  # 10 classes
])

model3.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model3.summary()
Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ dense_14 (Dense)                     │ (None, 512)                 │          20,992 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_2                │ (None, 512)                 │           2,048 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_2 (Dropout)                  │ (None, 512)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_15 (Dense)                     │ (None, 256)                 │         131,328 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_3                │ (None, 256)                 │           1,024 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_3 (Dropout)                  │ (None, 256)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_16 (Dense)                     │ (None, 128)                 │          32,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_4                │ (None, 128)                 │             512 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_4 (Dropout)                  │ (None, 128)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_17 (Dense)                     │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_5                │ (None, 64)                  │             256 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_5 (Dropout)                  │ (None, 64)                  │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_18 (Dense)                     │ (None, 10)                  │             650 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 197,962 (773.29 KB)
 Trainable params: 196,042 (765.79 KB)
 Non-trainable params: 1,920 (7.50 KB)
In [78]:
from tensorflow.keras.callbacks import EarlyStopping

early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

model3.fit(
    X_train, Y_train,
    validation_data=(X_test, Y_test),
    epochs=150, batch_size=32,
    callbacks=[early_stop],
    verbose=1
)
Epoch 1/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 5s 26ms/step - accuracy: 0.1228 - loss: 2.9445 - val_accuracy: 0.1960 - val_loss: 5.3707
Epoch 2/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.3126 - loss: 2.0838 - val_accuracy: 0.2840 - val_loss: 3.4806
Epoch 3/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.3622 - loss: 1.8702 - val_accuracy: 0.3680 - val_loss: 2.9951
Epoch 4/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.3830 - loss: 1.7712 - val_accuracy: 0.3760 - val_loss: 2.3111
Epoch 5/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.4205 - loss: 1.6564 - val_accuracy: 0.3960 - val_loss: 1.9312
Epoch 6/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.4482 - loss: 1.5810 - val_accuracy: 0.4640 - val_loss: 1.5538
Epoch 7/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.4401 - loss: 1.5212 - val_accuracy: 0.5040 - val_loss: 1.4928
Epoch 8/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 20ms/step - accuracy: 0.4667 - loss: 1.4911 - val_accuracy: 0.5480 - val_loss: 1.3615
Epoch 9/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.5174 - loss: 1.3848 - val_accuracy: 0.5480 - val_loss: 1.3262
Epoch 10/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 21ms/step - accuracy: 0.5363 - loss: 1.3024 - val_accuracy: 0.5280 - val_loss: 1.3417
Epoch 11/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 18ms/step - accuracy: 0.5518 - loss: 1.3130 - val_accuracy: 0.5320 - val_loss: 1.3164
Epoch 12/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.5347 - loss: 1.2810 - val_accuracy: 0.5680 - val_loss: 1.2725
Epoch 13/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 21ms/step - accuracy: 0.6173 - loss: 1.1125 - val_accuracy: 0.5640 - val_loss: 1.2819
Epoch 14/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 23ms/step - accuracy: 0.5888 - loss: 1.2067 - val_accuracy: 0.5400 - val_loss: 1.3178
Epoch 15/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.5916 - loss: 1.1562 - val_accuracy: 0.5640 - val_loss: 1.2765
Epoch 16/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6538 - loss: 1.0487 - val_accuracy: 0.5720 - val_loss: 1.2659
Epoch 17/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.6110 - loss: 1.1025 - val_accuracy: 0.5560 - val_loss: 1.2505
Epoch 18/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.6223 - loss: 1.1070 - val_accuracy: 0.5760 - val_loss: 1.2803
Epoch 19/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6518 - loss: 1.0476 - val_accuracy: 0.5480 - val_loss: 1.3092
Epoch 20/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.6375 - loss: 1.0310 - val_accuracy: 0.5560 - val_loss: 1.3138
Epoch 21/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.6734 - loss: 0.9847 - val_accuracy: 0.5480 - val_loss: 1.3523
Epoch 22/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6211 - loss: 1.0464 - val_accuracy: 0.5600 - val_loss: 1.3280
Epoch 23/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 12ms/step - accuracy: 0.6785 - loss: 0.9450 - val_accuracy: 0.5680 - val_loss: 1.2899
Epoch 24/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step - accuracy: 0.6449 - loss: 0.9460 - val_accuracy: 0.5480 - val_loss: 1.3180
Epoch 25/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6991 - loss: 0.9414 - val_accuracy: 0.5440 - val_loss: 1.3512
Epoch 26/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 11ms/step - accuracy: 0.7087 - loss: 0.8700 - val_accuracy: 0.5600 - val_loss: 1.3259
Epoch 27/150
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.7406 - loss: 0.8079 - val_accuracy: 0.5880 - val_loss: 1.3140
Out[78]:
<keras.src.callbacks.history.History at 0x7cda242909d0>
In [79]:
# Make predictions on the test set
Y_pred = model3.predict(X_test)

Y_pred = [np.argmax(i) for i in Y_pred]
WARNING:tensorflow:5 out of the last 17 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x7cda2470ac00> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 21ms/step
In [80]:
# Set style as dark
sns.set_style("dark")

# Set figure size
plt.figure(figsize = (15, 8))

# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")

# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)

# Plot the confusion matrix as heatmap. Use BuPu
sns.heatmap(cm, annot = True, cmap = "BuPu", fmt = 'g', cbar = False)

# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")

# Show the plot
plt.show()

# Print the metrics
print(classification_report(Y_test, Y_pred))
No description has been provided for this image
              precision    recall  f1-score   support

           0       0.71      0.42      0.53        24
           1       0.85      0.85      0.85        27
           2       0.42      0.42      0.42        24
           3       0.31      0.65      0.42        23
           4       0.45      0.38      0.42        26
           5       0.75      0.50      0.60        24
           6       0.81      0.75      0.78        28
           7       0.76      0.79      0.78        24
           8       0.45      0.46      0.46        28
           9       0.32      0.27      0.29        22

    accuracy                           0.56       250
   macro avg       0.58      0.55      0.55       250
weighted avg       0.59      0.56      0.56       250

Class Precision (Run 1) Recall (Run 1) F1-score (Run 1) Precision (Run 2) Recall (Run 2) F1-score (Run 2) Precision (Run 3) Recall (Run 3) F1-score (Run 3)
0 0.67 0.50 0.57 0.64 0.67 0.65 0.67 0.50 0.57
1 0.76 0.81 0.79 0.76 0.81 0.79 0.83 0.93 0.88
2 0.50 0.42 0.45 0.56 0.58 0.57 0.42 0.33 0.37
3 0.37 0.48 0.42 0.40 0.43 0.42 0.35 0.52 0.42
4 0.42 0.42 0.42 0.54 0.50 0.52 0.39 0.46 0.42
5 0.68 0.62 0.65 0.67 0.67 0.67 0.64 0.67 0.65
6 0.73 0.79 0.76 0.70 0.82 0.75 0.70 0.75 0.72
7 0.90 0.75 0.82 0.73 0.79 0.76 0.86 0.79 0.82
8 0.44 0.39 0.42 0.64 0.50 0.56 0.61 0.50 0.55
9 0.37 0.50 0.42 0.53 0.41 0.46 0.33 0.27 0.30
Accuracy 0.57 0.57 0.57 0.62 0.62 0.62 0.58 0.57 0.58
Macro Avg 0.58 0.57 0.57 0.62 0.62 0.61 0.58 0.57 0.58
Weighted Avg 0.59 0.57 0.58 0.62 0.62 0.62 0.59 0.58 0.58

Observations:

  • Best overall accuracy: Run 2 (0.62).
  • Highest precision: Run 3 for class 1 (0.83).
  • Most improved recall: Run 3 for class 1 (0.93).
  • Class 7 consistently high across runs.
  • Class 3 and 9 struggle in all runs.

CNN¶

Reshape Data for CNN

In [95]:
# Reshape input data to fit CNN (expand dimensions)
X_train_cnn = X_train.reshape(X_train.shape[0], X_train.shape[1], 1, 1)
X_test_cnn = X_test.reshape(X_test.shape[0], X_test.shape[1], 1, 1)

Define CNN Model

In [99]:
# Import necessary layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten, Dense, Dropout, BatchNormalization

# Create a Sequential CNN Model (1D for Audio)
model4 = Sequential([
    # First Conv Layer
    Conv1D(32, kernel_size=3, activation='relu', input_shape=(40, 1)),
    BatchNormalization(),
    MaxPooling1D(pool_size=2),
    Dropout(0.3),

    # Second Conv Layer
    Conv1D(64, kernel_size=3, activation='relu'),
    BatchNormalization(),
    MaxPooling1D(pool_size=2),
    Dropout(0.3),

    # Third Conv Layer
    Conv1D(128, kernel_size=3, activation='relu'),
    BatchNormalization(),
    MaxPooling1D(pool_size=2),
    Dropout(0.3),

    # Flatten & Dense Layers
    Flatten(),
    Dense(128, activation='relu'),
    BatchNormalization(),
    Dropout(0.4),

    # Output Layer
    Dense(10, activation='softmax')  # 10 genres
])

# Print Summary
model4.summary()
Model: "sequential_6"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv1d (Conv1D)                      │ (None, 38, 32)              │             128 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_21               │ (None, 38, 32)              │             128 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling1d (MaxPooling1D)         │ (None, 19, 32)              │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_21 (Dropout)                 │ (None, 19, 32)              │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv1d_1 (Conv1D)                    │ (None, 17, 64)              │           6,208 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_22               │ (None, 17, 64)              │             256 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling1d_1 (MaxPooling1D)       │ (None, 8, 64)               │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_22 (Dropout)                 │ (None, 8, 64)               │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv1d_2 (Conv1D)                    │ (None, 6, 128)              │          24,704 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_23               │ (None, 6, 128)              │             512 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling1d_2 (MaxPooling1D)       │ (None, 3, 128)              │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_23 (Dropout)                 │ (None, 3, 128)              │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten_3 (Flatten)                  │ (None, 384)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_25 (Dense)                     │ (None, 128)                 │          49,280 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ batch_normalization_24               │ (None, 128)                 │             512 │
│ (BatchNormalization)                 │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_24 (Dropout)                 │ (None, 128)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_26 (Dense)                     │ (None, 10)                  │           1,290 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 83,018 (324.29 KB)
 Trainable params: 82,314 (321.54 KB)
 Non-trainable params: 704 (2.75 KB)

Compile the Model

In [100]:
# Compile the model
model4.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Train the CNN Model

In [101]:
# Set Hyperparameters
num_epochs = 100
batch_size = 32

# Train Model
history = model4.fit(
    X_train_cnn, Y_train,
    validation_data=(X_test_cnn, Y_test),
    epochs=num_epochs,
    batch_size=batch_size,
    verbose=1
)
Epoch 1/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 6s 32ms/step - accuracy: 0.1527 - loss: 3.1711 - val_accuracy: 0.2320 - val_loss: 2.0631
Epoch 2/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.2434 - loss: 2.3932 - val_accuracy: 0.2640 - val_loss: 1.9939
Epoch 3/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.3181 - loss: 2.2320 - val_accuracy: 0.2960 - val_loss: 1.8749
Epoch 4/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.2766 - loss: 2.1360 - val_accuracy: 0.3840 - val_loss: 1.7817
Epoch 5/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3433 - loss: 2.0282 - val_accuracy: 0.3960 - val_loss: 1.6828
Epoch 6/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.3638 - loss: 1.9365 - val_accuracy: 0.4200 - val_loss: 1.6278
Epoch 7/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3853 - loss: 1.7895 - val_accuracy: 0.4760 - val_loss: 1.5540
Epoch 8/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4155 - loss: 1.7556 - val_accuracy: 0.4560 - val_loss: 1.5050
Epoch 9/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.3891 - loss: 1.7393 - val_accuracy: 0.5160 - val_loss: 1.4592
Epoch 10/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.3921 - loss: 1.7375 - val_accuracy: 0.4920 - val_loss: 1.4594
Epoch 11/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4178 - loss: 1.6658 - val_accuracy: 0.4720 - val_loss: 1.4474
Epoch 12/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4323 - loss: 1.5399 - val_accuracy: 0.4760 - val_loss: 1.4415
Epoch 13/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4399 - loss: 1.5494 - val_accuracy: 0.5160 - val_loss: 1.4006
Epoch 14/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.4365 - loss: 1.6432 - val_accuracy: 0.4840 - val_loss: 1.4029
Epoch 15/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4640 - loss: 1.5454 - val_accuracy: 0.4760 - val_loss: 1.4068
Epoch 16/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3907 - loss: 1.6032 - val_accuracy: 0.4800 - val_loss: 1.3779
Epoch 17/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.4534 - loss: 1.4843 - val_accuracy: 0.5040 - val_loss: 1.3506
Epoch 18/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5174 - loss: 1.3946 - val_accuracy: 0.4840 - val_loss: 1.3345
Epoch 19/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.4738 - loss: 1.4918 - val_accuracy: 0.5000 - val_loss: 1.3277
Epoch 20/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.4997 - loss: 1.4145 - val_accuracy: 0.5080 - val_loss: 1.3480
Epoch 21/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.4691 - loss: 1.4274 - val_accuracy: 0.5160 - val_loss: 1.3297
Epoch 22/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.4945 - loss: 1.4029 - val_accuracy: 0.5320 - val_loss: 1.3464
Epoch 23/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.4806 - loss: 1.4251 - val_accuracy: 0.5440 - val_loss: 1.3265
Epoch 24/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5281 - loss: 1.3386 - val_accuracy: 0.5080 - val_loss: 1.3122
Epoch 25/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.5156 - loss: 1.3522 - val_accuracy: 0.5440 - val_loss: 1.2963
Epoch 26/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5267 - loss: 1.3252 - val_accuracy: 0.5320 - val_loss: 1.3064
Epoch 27/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5816 - loss: 1.1886 - val_accuracy: 0.5200 - val_loss: 1.2934
Epoch 28/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5480 - loss: 1.3097 - val_accuracy: 0.5400 - val_loss: 1.2608
Epoch 29/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.5599 - loss: 1.2541 - val_accuracy: 0.5600 - val_loss: 1.2612
Epoch 30/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.5393 - loss: 1.2577 - val_accuracy: 0.5840 - val_loss: 1.2503
Epoch 31/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5461 - loss: 1.2134 - val_accuracy: 0.5640 - val_loss: 1.2399
Epoch 32/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5881 - loss: 1.1734 - val_accuracy: 0.5560 - val_loss: 1.2310
Epoch 33/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5384 - loss: 1.2615 - val_accuracy: 0.5720 - val_loss: 1.2449
Epoch 34/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5884 - loss: 1.1504 - val_accuracy: 0.5480 - val_loss: 1.2350
Epoch 35/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.6157 - loss: 1.0851 - val_accuracy: 0.5400 - val_loss: 1.2448
Epoch 36/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5705 - loss: 1.1760 - val_accuracy: 0.5520 - val_loss: 1.2239
Epoch 37/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5924 - loss: 1.1912 - val_accuracy: 0.5720 - val_loss: 1.2082
Epoch 38/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5859 - loss: 1.1989 - val_accuracy: 0.5480 - val_loss: 1.2125
Epoch 39/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 23ms/step - accuracy: 0.6000 - loss: 1.1794 - val_accuracy: 0.5640 - val_loss: 1.2124
Epoch 40/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.6050 - loss: 1.1284 - val_accuracy: 0.5760 - val_loss: 1.2092
Epoch 41/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.6305 - loss: 1.0575 - val_accuracy: 0.5720 - val_loss: 1.2363
Epoch 42/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.5868 - loss: 1.1914 - val_accuracy: 0.5840 - val_loss: 1.2163
Epoch 43/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 25ms/step - accuracy: 0.6044 - loss: 1.1277 - val_accuracy: 0.5680 - val_loss: 1.2146
Epoch 44/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6478 - loss: 1.0258 - val_accuracy: 0.5640 - val_loss: 1.2095
Epoch 45/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6546 - loss: 1.0141 - val_accuracy: 0.5600 - val_loss: 1.1999
Epoch 46/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 17ms/step - accuracy: 0.6218 - loss: 1.0096 - val_accuracy: 0.5880 - val_loss: 1.1766
Epoch 47/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6084 - loss: 1.0482 - val_accuracy: 0.5880 - val_loss: 1.1559
Epoch 48/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6548 - loss: 1.0395 - val_accuracy: 0.5720 - val_loss: 1.1851
Epoch 49/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6247 - loss: 1.0283 - val_accuracy: 0.5800 - val_loss: 1.1841
Epoch 50/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6464 - loss: 1.0146 - val_accuracy: 0.5680 - val_loss: 1.1708
Epoch 51/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6199 - loss: 1.0025 - val_accuracy: 0.5840 - val_loss: 1.1751
Epoch 52/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6699 - loss: 0.9646 - val_accuracy: 0.5680 - val_loss: 1.1832
Epoch 53/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6821 - loss: 0.9045 - val_accuracy: 0.5720 - val_loss: 1.1738
Epoch 54/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.6697 - loss: 0.9196 - val_accuracy: 0.5840 - val_loss: 1.1756
Epoch 55/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6458 - loss: 0.9916 - val_accuracy: 0.5760 - val_loss: 1.1929
Epoch 56/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6760 - loss: 0.9553 - val_accuracy: 0.5760 - val_loss: 1.1864
Epoch 57/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7321 - loss: 0.8147 - val_accuracy: 0.5640 - val_loss: 1.1866
Epoch 58/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6716 - loss: 0.8989 - val_accuracy: 0.5800 - val_loss: 1.1896
Epoch 59/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6845 - loss: 0.8915 - val_accuracy: 0.5640 - val_loss: 1.1839
Epoch 60/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7238 - loss: 0.7884 - val_accuracy: 0.5800 - val_loss: 1.1879
Epoch 61/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7078 - loss: 0.8767 - val_accuracy: 0.5840 - val_loss: 1.1954
Epoch 62/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6876 - loss: 0.8778 - val_accuracy: 0.5720 - val_loss: 1.1893
Epoch 63/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6764 - loss: 0.8922 - val_accuracy: 0.5720 - val_loss: 1.1855
Epoch 64/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 17ms/step - accuracy: 0.7098 - loss: 0.8857 - val_accuracy: 0.5800 - val_loss: 1.1821
Epoch 65/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - accuracy: 0.7013 - loss: 0.8659 - val_accuracy: 0.5640 - val_loss: 1.1665
Epoch 66/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7251 - loss: 0.7978 - val_accuracy: 0.5880 - val_loss: 1.1796
Epoch 67/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.6739 - loss: 0.9158 - val_accuracy: 0.5880 - val_loss: 1.1667
Epoch 68/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7181 - loss: 0.8279 - val_accuracy: 0.5760 - val_loss: 1.1665
Epoch 69/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6994 - loss: 0.7929 - val_accuracy: 0.5800 - val_loss: 1.1673
Epoch 70/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6930 - loss: 0.8592 - val_accuracy: 0.5960 - val_loss: 1.1804
Epoch 71/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6974 - loss: 0.8465 - val_accuracy: 0.5680 - val_loss: 1.1800
Epoch 72/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7237 - loss: 0.7922 - val_accuracy: 0.5760 - val_loss: 1.1803
Epoch 73/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7191 - loss: 0.7505 - val_accuracy: 0.5760 - val_loss: 1.1839
Epoch 74/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7152 - loss: 0.7647 - val_accuracy: 0.5880 - val_loss: 1.1875
Epoch 75/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7281 - loss: 0.7336 - val_accuracy: 0.5760 - val_loss: 1.1960
Epoch 76/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7193 - loss: 0.7853 - val_accuracy: 0.5800 - val_loss: 1.1985
Epoch 77/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7062 - loss: 0.8224 - val_accuracy: 0.5880 - val_loss: 1.1981
Epoch 78/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7476 - loss: 0.7051 - val_accuracy: 0.6040 - val_loss: 1.1694
Epoch 79/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7490 - loss: 0.7548 - val_accuracy: 0.6040 - val_loss: 1.1989
Epoch 80/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7345 - loss: 0.7588 - val_accuracy: 0.5960 - val_loss: 1.1945
Epoch 81/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7210 - loss: 0.7485 - val_accuracy: 0.5880 - val_loss: 1.1921
Epoch 82/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7420 - loss: 0.7601 - val_accuracy: 0.6160 - val_loss: 1.1732
Epoch 83/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7814 - loss: 0.7226 - val_accuracy: 0.6160 - val_loss: 1.1707
Epoch 84/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7798 - loss: 0.6639 - val_accuracy: 0.6000 - val_loss: 1.1937
Epoch 85/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7409 - loss: 0.7301 - val_accuracy: 0.6160 - val_loss: 1.1650
Epoch 86/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7821 - loss: 0.6272 - val_accuracy: 0.5920 - val_loss: 1.1887
Epoch 87/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.7793 - loss: 0.6144 - val_accuracy: 0.6000 - val_loss: 1.1742
Epoch 88/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 25ms/step - accuracy: 0.8048 - loss: 0.6411 - val_accuracy: 0.5960 - val_loss: 1.1944
Epoch 89/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7585 - loss: 0.6628 - val_accuracy: 0.6280 - val_loss: 1.1864
Epoch 90/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.7776 - loss: 0.6323 - val_accuracy: 0.6000 - val_loss: 1.2027
Epoch 91/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7749 - loss: 0.6442 - val_accuracy: 0.5840 - val_loss: 1.2392
Epoch 92/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 22ms/step - accuracy: 0.8024 - loss: 0.6844 - val_accuracy: 0.6120 - val_loss: 1.2414
Epoch 93/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7541 - loss: 0.6975 - val_accuracy: 0.5800 - val_loss: 1.2347
Epoch 94/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.8317 - loss: 0.5621 - val_accuracy: 0.6000 - val_loss: 1.2354
Epoch 95/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.8180 - loss: 0.5335 - val_accuracy: 0.5720 - val_loss: 1.2360
Epoch 96/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7829 - loss: 0.6508 - val_accuracy: 0.5960 - val_loss: 1.2347
Epoch 97/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.7943 - loss: 0.6066 - val_accuracy: 0.6080 - val_loss: 1.2209
Epoch 98/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.8113 - loss: 0.5884 - val_accuracy: 0.6080 - val_loss: 1.2130
Epoch 99/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7712 - loss: 0.6637 - val_accuracy: 0.6160 - val_loss: 1.2241
Epoch 100/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.8038 - loss: 0.6003 - val_accuracy: 0.6000 - val_loss: 1.2220
In [102]:
# Make predictions on the test set
Y_pred = model4.predict(X_test)

Y_pred = [np.argmax(i) for i in Y_pred]
WARNING:tensorflow:5 out of the last 17 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x7cda2415c040> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 29ms/step
In [103]:
# Set style as dark
sns.set_style("dark")

# Set figure size
plt.figure(figsize = (15, 8))

# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")

# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)

# Plot the confusion matrix as heatmap. Use BuPu
sns.heatmap(cm, annot = True, cmap = "BuPu", fmt = 'g', cbar = False)

# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")

# Show the plot
plt.show()

# Print the metrics
print(classification_report(Y_test, Y_pred))
No description has been provided for this image
              precision    recall  f1-score   support

           0       0.86      0.50      0.63        24
           1       0.77      0.85      0.81        27
           2       0.39      0.46      0.42        24
           3       0.32      0.26      0.29        23
           4       0.55      0.69      0.61        26
           5       0.56      0.62      0.59        24
           6       0.85      0.82      0.84        28
           7       0.72      0.88      0.79        24
           8       0.57      0.46      0.51        28
           9       0.40      0.36      0.38        22

    accuracy                           0.60       250
   macro avg       0.60      0.59      0.59       250
weighted avg       0.61      0.60      0.59       250

Model Performance Comparison

Model Precision Recall F1-Score Accuracy
Model 1 (ANN) 0.58 0.57 0.57 0.57
Model 2 (ANN) 0.59 0.57 0.58 0.58
Model 3 (ANN) 0.62 0.62 0.61 0.62
Model 4 (CNN) 0.61 0.60 0.59 0.60

Model Ranking

  1. Model 3 (ANN, deeper network) – Best accuracy (0.62) but slightly worse F1-score than CNN.
  2. Model 4 (CNN, 1D Conv) – Comparable to Model 3 but has better recall and stability.
  3. Model 2 (ANN, batch norm & dropout) – Moderate performance.
  4. Model 1 (Basic ANN) – Lowest performance.

Observation:

  • Model 3 (ANN) and Model 4 (CNN) perform best.
  • CNN has better recall, meaning it generalizes better across classes.
  • ANN (Model 3) has slightly better accuracy but may be overfitting slightly.
  • If focusing on generalization, CNN is preferable.

Model 5¶

Reshape Data for CNN

In [110]:
X_train_cnn = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)  # (samples, 40, 1)
X_test_cnn = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)  # (samples, 40, 1)

Define CNN Model

In [111]:
model5 = Sequential([
    Conv1D(32, kernel_size=3, activation='relu', input_shape=(40, 1)),  # Ensure input_shape is (40,1)
    BatchNormalization(),
    MaxPooling1D(pool_size=2),
    Dropout(0.3),

    Conv1D(64, kernel_size=3, activation='relu'),
    BatchNormalization(),
    MaxPooling1D(pool_size=2),
    Dropout(0.3),

    Conv1D(128, kernel_size=3, activation='relu'),
    BatchNormalization(),
    MaxPooling1D(pool_size=2),
    Dropout(0.3),

    Flatten(),
    Dense(128, activation='relu'),
    BatchNormalization(),
    Dropout(0.4),

    Dense(10, activation='softmax')  # 10 output classes
])

Compile the Model

In [112]:
# Compile the model
model5.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Train the CNN Model

In [113]:
# Set Hyperparameters
num_epochs = 100
batch_size = 32

# Train Model
history = model5.fit(
    X_train_cnn, Y_train,
    validation_data=(X_test_cnn, Y_test),
    epochs=num_epochs,
    batch_size=batch_size,
    verbose=1
)
Epoch 1/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 11s 49ms/step - accuracy: 0.1319 - loss: 3.1830 - val_accuracy: 0.2400 - val_loss: 2.1917
Epoch 2/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.2038 - loss: 2.6141 - val_accuracy: 0.3120 - val_loss: 1.9182
Epoch 3/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.3074 - loss: 2.2631 - val_accuracy: 0.3640 - val_loss: 1.7750
Epoch 4/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.2730 - loss: 2.1906 - val_accuracy: 0.3880 - val_loss: 1.6925
Epoch 5/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.3243 - loss: 2.0105 - val_accuracy: 0.4520 - val_loss: 1.6353
Epoch 6/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3603 - loss: 1.9082 - val_accuracy: 0.4960 - val_loss: 1.5842
Epoch 7/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3690 - loss: 1.8834 - val_accuracy: 0.4760 - val_loss: 1.5562
Epoch 8/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.3750 - loss: 1.8462 - val_accuracy: 0.4880 - val_loss: 1.5227
Epoch 9/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.3528 - loss: 1.7773 - val_accuracy: 0.4960 - val_loss: 1.4930
Epoch 10/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.4402 - loss: 1.6508 - val_accuracy: 0.5160 - val_loss: 1.4523
Epoch 11/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.4070 - loss: 1.7770 - val_accuracy: 0.5000 - val_loss: 1.4617
Epoch 12/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4323 - loss: 1.6061 - val_accuracy: 0.5080 - val_loss: 1.4538
Epoch 13/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.4564 - loss: 1.5315 - val_accuracy: 0.5160 - val_loss: 1.4185
Epoch 14/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.4317 - loss: 1.6456 - val_accuracy: 0.5120 - val_loss: 1.3931
Epoch 15/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4750 - loss: 1.5263 - val_accuracy: 0.5280 - val_loss: 1.3915
Epoch 16/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.4636 - loss: 1.5361 - val_accuracy: 0.5280 - val_loss: 1.3922
Epoch 17/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4888 - loss: 1.4892 - val_accuracy: 0.5280 - val_loss: 1.3658
Epoch 18/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5159 - loss: 1.4322 - val_accuracy: 0.5200 - val_loss: 1.3524
Epoch 19/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.4780 - loss: 1.4530 - val_accuracy: 0.5200 - val_loss: 1.3561
Epoch 20/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.4846 - loss: 1.5005 - val_accuracy: 0.5240 - val_loss: 1.3658
Epoch 21/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5087 - loss: 1.4123 - val_accuracy: 0.5440 - val_loss: 1.3265
Epoch 22/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5400 - loss: 1.2802 - val_accuracy: 0.5280 - val_loss: 1.3115
Epoch 23/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.5479 - loss: 1.3316 - val_accuracy: 0.5560 - val_loss: 1.3159
Epoch 24/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - accuracy: 0.5662 - loss: 1.2315 - val_accuracy: 0.5440 - val_loss: 1.3033
Epoch 25/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 32ms/step - accuracy: 0.5332 - loss: 1.2953 - val_accuracy: 0.5480 - val_loss: 1.3162
Epoch 26/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.5306 - loss: 1.3366 - val_accuracy: 0.5480 - val_loss: 1.2996
Epoch 27/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.5311 - loss: 1.2940 - val_accuracy: 0.5600 - val_loss: 1.2965
Epoch 28/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5729 - loss: 1.2473 - val_accuracy: 0.5480 - val_loss: 1.2779
Epoch 29/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.5358 - loss: 1.3840 - val_accuracy: 0.5520 - val_loss: 1.2810
Epoch 30/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.5562 - loss: 1.2140 - val_accuracy: 0.5720 - val_loss: 1.2628
Epoch 31/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.5854 - loss: 1.2260 - val_accuracy: 0.5720 - val_loss: 1.2538
Epoch 32/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5953 - loss: 1.1874 - val_accuracy: 0.5880 - val_loss: 1.2486
Epoch 33/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5837 - loss: 1.2231 - val_accuracy: 0.5920 - val_loss: 1.2231
Epoch 34/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5539 - loss: 1.2813 - val_accuracy: 0.5920 - val_loss: 1.2306
Epoch 35/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6072 - loss: 1.1923 - val_accuracy: 0.5680 - val_loss: 1.2561
Epoch 36/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.5933 - loss: 1.1734 - val_accuracy: 0.5480 - val_loss: 1.2627
Epoch 37/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6410 - loss: 1.1140 - val_accuracy: 0.5560 - val_loss: 1.2375
Epoch 38/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5768 - loss: 1.1554 - val_accuracy: 0.5840 - val_loss: 1.2224
Epoch 39/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6302 - loss: 1.1210 - val_accuracy: 0.6000 - val_loss: 1.2261
Epoch 40/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5860 - loss: 1.1728 - val_accuracy: 0.6000 - val_loss: 1.2110
Epoch 41/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.5659 - loss: 1.1502 - val_accuracy: 0.5840 - val_loss: 1.2236
Epoch 42/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6345 - loss: 1.0804 - val_accuracy: 0.5800 - val_loss: 1.2231
Epoch 43/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6009 - loss: 1.1118 - val_accuracy: 0.6000 - val_loss: 1.2147
Epoch 44/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.5910 - loss: 1.1329 - val_accuracy: 0.5760 - val_loss: 1.2174
Epoch 45/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6243 - loss: 1.0147 - val_accuracy: 0.5680 - val_loss: 1.2180
Epoch 46/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6303 - loss: 1.0807 - val_accuracy: 0.5840 - val_loss: 1.2113
Epoch 47/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.6230 - loss: 1.0242 - val_accuracy: 0.5600 - val_loss: 1.2085
Epoch 48/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.6370 - loss: 1.0484 - val_accuracy: 0.5720 - val_loss: 1.2127
Epoch 49/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - accuracy: 0.6677 - loss: 1.0764 - val_accuracy: 0.5520 - val_loss: 1.2479
Epoch 50/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.6145 - loss: 1.0349 - val_accuracy: 0.5480 - val_loss: 1.2012
Epoch 51/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.6444 - loss: 1.0064 - val_accuracy: 0.5560 - val_loss: 1.2370
Epoch 52/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6343 - loss: 1.0336 - val_accuracy: 0.5480 - val_loss: 1.2321
Epoch 53/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6599 - loss: 0.9732 - val_accuracy: 0.5600 - val_loss: 1.2066
Epoch 54/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6516 - loss: 0.9959 - val_accuracy: 0.5640 - val_loss: 1.2004
Epoch 55/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6661 - loss: 0.9789 - val_accuracy: 0.5680 - val_loss: 1.2261
Epoch 56/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6645 - loss: 0.9656 - val_accuracy: 0.5800 - val_loss: 1.2136
Epoch 57/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6701 - loss: 0.9608 - val_accuracy: 0.5760 - val_loss: 1.1997
Epoch 58/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6829 - loss: 0.9780 - val_accuracy: 0.5680 - val_loss: 1.2072
Epoch 59/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.6723 - loss: 0.8967 - val_accuracy: 0.5680 - val_loss: 1.2127
Epoch 60/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6510 - loss: 1.0032 - val_accuracy: 0.5960 - val_loss: 1.1992
Epoch 61/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6844 - loss: 0.8882 - val_accuracy: 0.5920 - val_loss: 1.2095
Epoch 62/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.6956 - loss: 0.8859 - val_accuracy: 0.6000 - val_loss: 1.1949
Epoch 63/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7227 - loss: 0.8781 - val_accuracy: 0.5880 - val_loss: 1.2037
Epoch 64/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6812 - loss: 0.8798 - val_accuracy: 0.5720 - val_loss: 1.2156
Epoch 65/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.6757 - loss: 0.9462 - val_accuracy: 0.5880 - val_loss: 1.1921
Epoch 66/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6960 - loss: 0.8690 - val_accuracy: 0.5720 - val_loss: 1.2277
Epoch 67/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6853 - loss: 0.8997 - val_accuracy: 0.5640 - val_loss: 1.2350
Epoch 68/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.6931 - loss: 0.8683 - val_accuracy: 0.5800 - val_loss: 1.2263
Epoch 69/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.6570 - loss: 0.9462 - val_accuracy: 0.5800 - val_loss: 1.2125
Epoch 70/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7225 - loss: 0.8557 - val_accuracy: 0.5760 - val_loss: 1.1988
Epoch 71/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 21ms/step - accuracy: 0.7438 - loss: 0.7584 - val_accuracy: 0.5800 - val_loss: 1.1875
Epoch 72/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.7136 - loss: 0.7967 - val_accuracy: 0.5800 - val_loss: 1.2136
Epoch 73/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 25ms/step - accuracy: 0.7059 - loss: 0.7942 - val_accuracy: 0.5800 - val_loss: 1.2506
Epoch 74/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 26ms/step - accuracy: 0.6957 - loss: 0.8479 - val_accuracy: 0.5720 - val_loss: 1.2137
Epoch 75/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.6820 - loss: 0.8297 - val_accuracy: 0.5720 - val_loss: 1.2268
Epoch 76/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 14ms/step - accuracy: 0.7357 - loss: 0.7884 - val_accuracy: 0.5920 - val_loss: 1.2413
Epoch 77/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7303 - loss: 0.7509 - val_accuracy: 0.5840 - val_loss: 1.2104
Epoch 78/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7072 - loss: 0.8316 - val_accuracy: 0.5760 - val_loss: 1.2432
Epoch 79/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7232 - loss: 0.7848 - val_accuracy: 0.5680 - val_loss: 1.2636
Epoch 80/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7426 - loss: 0.7366 - val_accuracy: 0.5640 - val_loss: 1.2432
Epoch 81/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7413 - loss: 0.7741 - val_accuracy: 0.5840 - val_loss: 1.2028
Epoch 82/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7235 - loss: 0.7584 - val_accuracy: 0.5720 - val_loss: 1.2341
Epoch 83/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7424 - loss: 0.7615 - val_accuracy: 0.5520 - val_loss: 1.2321
Epoch 84/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7867 - loss: 0.6474 - val_accuracy: 0.5600 - val_loss: 1.2382
Epoch 85/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7469 - loss: 0.7123 - val_accuracy: 0.5840 - val_loss: 1.2176
Epoch 86/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 14ms/step - accuracy: 0.7488 - loss: 0.7200 - val_accuracy: 0.5760 - val_loss: 1.2159
Epoch 87/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7307 - loss: 0.7601 - val_accuracy: 0.5680 - val_loss: 1.2168
Epoch 88/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.7440 - loss: 0.6830 - val_accuracy: 0.5800 - val_loss: 1.2613
Epoch 89/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step - accuracy: 0.7238 - loss: 0.7883 - val_accuracy: 0.5680 - val_loss: 1.2639
Epoch 90/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7548 - loss: 0.7278 - val_accuracy: 0.5800 - val_loss: 1.2360
Epoch 91/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7474 - loss: 0.7291 - val_accuracy: 0.5960 - val_loss: 1.2274
Epoch 92/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7510 - loss: 0.6723 - val_accuracy: 0.5960 - val_loss: 1.2114
Epoch 93/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 16ms/step - accuracy: 0.7765 - loss: 0.6599 - val_accuracy: 0.6080 - val_loss: 1.2310
Epoch 94/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7682 - loss: 0.6546 - val_accuracy: 0.6040 - val_loss: 1.2476
Epoch 95/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7498 - loss: 0.7280 - val_accuracy: 0.5720 - val_loss: 1.2807
Epoch 96/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 21ms/step - accuracy: 0.7696 - loss: 0.6917 - val_accuracy: 0.5760 - val_loss: 1.2411
Epoch 97/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 28ms/step - accuracy: 0.7601 - loss: 0.6502 - val_accuracy: 0.5880 - val_loss: 1.2313
Epoch 98/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 27ms/step - accuracy: 0.7440 - loss: 0.6833 - val_accuracy: 0.5720 - val_loss: 1.2437
Epoch 99/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 20ms/step - accuracy: 0.7922 - loss: 0.6392 - val_accuracy: 0.5840 - val_loss: 1.2336
Epoch 100/100
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 15ms/step - accuracy: 0.7924 - loss: 0.6201 - val_accuracy: 0.5840 - val_loss: 1.2365

Evaluate and Predict

In [114]:
# Make predictions on the test set
Y_pred = model5.predict(X_test_cnn)
Y_pred = [np.argmax(i) for i in Y_pred]
8/8 ━━━━━━━━━━━━━━━━━━━━ 0s 28ms/step
In [115]:
# Set style as dark
sns.set_style("dark")

# Set figure size
plt.figure(figsize=(15, 8))

# Plot the title
plt.title("CONFUSION MATRIX FOR Song Genre PREDICTION")

# Confusion matrix
cm = confusion_matrix([int(x) for x in Y_test], Y_pred)

# Plot the confusion matrix as heatmap
sns.heatmap(cm, annot=True, cmap="BuPu", fmt='g', cbar=False)

# Set X-label and Y-label
plt.xlabel("ACTUAL VALUES")
plt.ylabel("PREDICTED VALUES")

# Show the plot
plt.show()

# Print the metrics
print(classification_report(Y_test, Y_pred))
No description has been provided for this image
              precision    recall  f1-score   support

           0       0.79      0.46      0.58        24
           1       0.91      0.74      0.82        27
           2       0.50      0.54      0.52        24
           3       0.31      0.39      0.35        23
           4       0.60      0.58      0.59        26
           5       0.48      0.50      0.49        24
           6       0.80      0.86      0.83        28
           7       0.66      0.88      0.75        24
           8       0.58      0.50      0.54        28
           9       0.30      0.32      0.31        22

    accuracy                           0.58       250
   macro avg       0.59      0.58      0.58       250
weighted avg       0.60      0.58      0.59       250

Model Performance Comparison

Model Precision Recall F1-Score Accuracy Macro Avg Weighted Avg
Model 1 0.58 0.57 0.57 0.57 0.58 0.59
Model 2 0.62 0.62 0.61 0.62 0.62 0.62
Model 3 0.59 0.57 0.57 0.57 0.58 0.59
Model 4 (CNN) 0.61 0.60 0.59 0.60 0.60 0.61
Model 5 (Improved CNN) 0.59 0.58 0.58 0.58 0.59 0.60

Model Rankings:

  1. Model 2 – Best overall performance.
  2. Model 4 (CNN) – Slightly lower but better than ANN models.
  3. Model 5 – Performance close to Model 4.
  4. Model 1 & Model 3 – Lower accuracy than CNN models.

Conclusion & Recommendations¶

Model Ranking

  • Model 2 (Best ANN) – Highest accuracy (0.62), best macro/weighted avg, and most stable class-wise performance.
  • Model 4 (Best CNN) – Similar accuracy (0.61), stronger feature extraction but slightly lower recall than Model 2.
  • Model 5 (Improved CNN) – Good feature learning (0.60 accuracy) but slightly weaker than Model 4.
  • Model 3 (Enhanced ANN) – Better than Model 1 but outperformed by CNN models.
  • Model 1 (Baseline ANN) – Lowest accuracy (0.57), struggles with genre differentiation.

CNN models show promise but require optimization to outperform ANN models consistently.

Issues & Fixes

  • Class Imbalance: Some genres (e.g., classical, metal, pop) perform well, while others (e.g., country, reggae) have lower recall. Consider data augmentation or class-weight balancing.
  • Low Performance for Certain Genres: Country (class 3) and Reggae (class 9) are consistently misclassified. More MFCCs or spectral contrast features could improve performance.
  • Overfitting Risk: CNN models show a tendency to overfit. Adjust dropout rates, batch normalization, and data augmentation to improve generalization.
  • Epoch Reset: When training multiple models sequentially, epochs should be reset to prevent unintended training continuation.
  • CNN vs ANN Trade-Off: CNNs extract features better but need deeper architectures or filter tuning to surpass Model 2.

Final Takeaway

  • Model 2 remains the best overall baseline.
  • CNN models (Model 4 & 5) show potential but require further tuning.
  • Address genre misclassification with better MFCC features and augmentation.
  • Optimize CNNs with deeper layers, filter tuning, and learning rate adjustments.
  • Experiment with hybrid architectures (CNN+LSTM) for sequential feature learning.
In [117]:
# google

path_ipynb = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/Music Genre Classification using Deep Learning.ipynb'
notebook_path = path_ipynb

!jupyter nbconvert --to html "{notebook_path}"

from google.colab import files
path_html = '/content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/Music Genre Classification using Deep Learning.html'

files.download(path_html)
[NbConvertApp] Converting notebook /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/Music Genre Classification using Deep Learning.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 13 image(s).
[NbConvertApp] Writing 9063356 bytes to /content/drive/MyDrive/My DS DA/Neural Networks (Deep Learning)/Music Classification/Music Genre Classification using Deep Learning.html
In [116]: